CrLfFileStream as default?

Michael S. Klein mklein at alumni.caltech.edu
Sat Oct 31 23:28:18 UTC 1998


> I don't think there is any reason for "guessLineEndConvention" in the
> approach I propose and if it guesses wrong (especially on an already
> anomalous file), CrLfFileStream, seems to produce anomalies that I don't
> think are caused just by cut-and-paste.

Sometimes you may want to guess, sometimes you may want a rigid line end
policy.

> The native platform line termination conventions I know of are as follows:
> 
> 	DOS/Windows on x86		CrLf
> 	UNIX				  Lf
> 	Mac				Cr

Smalltalks use cr.  There is also Unicode which has explicitly different
line separators and paragraph separators ( U+2028 and U+2029 ).
Personally, I think the whole idea of "Control Characters" is perverse.
Line end conventions are just the best-known symptom of this perversity.

> So when reading external text, all interline spacing (carriage return, form
> feed, line feed, and vertical tab) characters are handled as follows: 
> 
>   Cr        - add Cr to internal collection, if followed by Lf then read and
> ignore Lf. 
>   Lf        - if proceeded by Cr then ignore Lf else add Cr to internal
> collection.
>   Ff and VT - ignore.

This works sometimes, but there are some of us who actually use ff & vt's
placed in text by other people.

> This does require reading external text a character at a time, but doesn't
> seem prohibitively expensive.

First make it work.... then make it fast

> CrLfFileStream also doesn't deal with the problems like that of runs
> breaking in Text instances.  It seemed to me that a lot of this kind of
> anomaly exists when just about any of the classes start reading and writing
> to external devices -- ports, disks, etc.

Yeah, strings are deceptively easy to externalize.

As far as line end convention goes, I think the important thing to do is 
to factor out the handling into a Policy object.  Otherwise the streaming 
code just gets all krufted up with different cases.

If somebody wants a different policy, they add a new class instead of 
futzing with the convoluted code.

-- Mike Klein





More information about the Squeak-dev mailing list