[Newbies] Re: Splitting Excel csv into lines

Jerome Peace peace_the_dreamer at yahoo.com
Wed Dec 24 17:11:04 UTC 2008


[Newbies] Splitting Excel csv into lines

***
>stephan at stack.nl stephan at stack.nl 
>Wed Dec 24 13:32:34 UTC 2008 
>
>
>String lines doesn't handle separating Excel copy-and-paste very well,
>because soft enters in a cell get split into separate lines.
>
>Ive done now:
>
>splitIntoLines: aString
>	"Return a collection with the string-lines of the receiver."
>
>	| input char temp inQuote|
>	input := aString readStream.
>	^ Array streamContents: [ :output |
>		temp := ''.
>		inQuote := false.
>		[ input atEnd ] whileFalse: [
>			char := input next.
>			char = $" ifTrue: [
>				inQuote ifTrue:[
>					input peek = (Character tab) ifTrue: [
>						char := input next.].
>					input peek = (Character cr) ifTrue: [
>						char := input next.
>						inQuote := false]]].
>			char = (Character tab) ifTrue:[
>				inQuote ifTrue: [ inQuote := false]
>				ifFalse: [
>					input peek= $" ifTrue: [
>						input next.
>						inQuote:=true]]].
>			char = (Character cr)
>				ifFalse: [temp := temp, char asString]
>				ifTrue: [
>					inQuote ifFalse: [
>						output nextPut: temp.
>						temp:=''.
>						input peek = Character lf ifTrue: [input next]]]]]
>
>I would be interested in (speed & elegance & mistakes) improvements to it.

Hi Stephan,

A quick look at your code shows it to be "hard to read".
>From the outside I had a hard time figuring out what it is trying to do.
Actually I gave up. 
My guess is this has a 50-50 chance of doing what you want in
all cases.

So step 1:
Describe the rules for breaking up excel lines in a comment.
step 2:
Now that you have code to do the work, prove it works.
Write several example lines for it to break up.
( My assistant Puck says: "write really devious examples"
By which he means examples that will be hard to break up)

Have it break them up.
Write a test  asserting the the input lines break up into their output components.(Look at other sunit tests for examples).

revising for readability.
A lot is done to keep track of quote state. 
I would write a separate method for dealing with the input while in quote state.
As an argument you can pass the input stream and possibly the output stream.
When it returns you would no longer be in quote mode and the streams would be updated.

Once this is done run the tests again. 
Do they still work?

revising for speed
Note do NOT work on this first. 
Work on speed after you have assured you get the correct results.

What you need to know about
 [temp := temp , char asString]
is that , will copy the string each time.
Building a string up character by character will be Sloooow.
So make temp a writeStream of characters and build it by doing nextPut: .
Retrieve the string by temp contents.

After you do this step. Run the tests again.
Do they still work?

Hth,

Yours in curiosity and service, --Jerome Peace








***





      


More information about the Beginners mailing list