I don't think there is any reason for "guessLineEndConvention" in the approach I propose and if it guesses wrong (especially on an already anomalous file), CrLfFileStream, seems to produce anomalies that I don't think are caused just by cut-and-paste.
Sometimes you may want to guess, sometimes you may want a rigid line end policy.
The native platform line termination conventions I know of are as follows:
DOS/Windows on x86 CrLf UNIX Lf Mac Cr
Smalltalks use cr. There is also Unicode which has explicitly different line separators and paragraph separators ( U+2028 and U+2029 ). Personally, I think the whole idea of "Control Characters" is perverse. Line end conventions are just the best-known symptom of this perversity.
So when reading external text, all interline spacing (carriage return, form feed, line feed, and vertical tab) characters are handled as follows:
Cr - add Cr to internal collection, if followed by Lf then read and ignore Lf. Lf - if proceeded by Cr then ignore Lf else add Cr to internal collection. Ff and VT - ignore.
This works sometimes, but there are some of us who actually use ff & vt's placed in text by other people.
This does require reading external text a character at a time, but doesn't seem prohibitively expensive.
First make it work.... then make it fast
CrLfFileStream also doesn't deal with the problems like that of runs breaking in Text instances. It seemed to me that a lot of this kind of anomaly exists when just about any of the classes start reading and writing to external devices -- ports, disks, etc.
Yeah, strings are deceptively easy to externalize.
As far as line end convention goes, I think the important thing to do is to factor out the handling into a Policy object. Otherwise the streaming code just gets all krufted up with different cases.
If somebody wants a different policy, they add a new class instead of futzing with the convoluted code.
-- Mike Klein