lex@cc.gatech.edu wrote:
[snipped good description of the distinction between the guessing used to pick line ending that should be used for new files, and that for appending to old files]
Now in the CrLfFileStream in the standard image, this second guess is also used when reading. If the convention was guessed to be CR, then any LF's read in will be left as is. This is also debatable, and in fact I think that LF's should be converted to CR's no matter what convention has been guessed.
So I sent a patch around a few days ago that did just this.
Now, I've been using this setup for a week or so now with no troubles. However, I've not messed with any *really* strange files....
I checked out the patch for #next and it has a problem of getting stuck when the *last* character of the file is a Cr. The position gets bumped back, and you keep getting the Cr over and over again, never reaching the end.
The following is one way to fix it:
CrLfFileStream >> next | char secondChar | char _ super next. self isBinary ifTrue: [^char]. char == Cr ifTrue: [secondChar _ super next. secondChar ifNotNil: [secondChar == Lf ifFalse: [self skip: -1]]. ^Cr]. char == Lf ifTrue: [^Cr]. ^char
Personally, I haven't decided whether to start using this method or to stick with the (current) approach of expecting consistent line endings within a file being read. I do use something like the above, but only on an as-needed basis to fix-up "problem" files.
I can see cases where each of the two approaches (accepting mixed line endings vs. expecting only one type) can each have their uses. Maybe someone can figure out an approach that could allow either in a flexible yet clean implementation. Anyone up for refactoring the Stream hierarchy? :-)
------------------------------------------- Bill Dargel wdargel@shoshana.com Shoshana Technologies 100 West Joy Road, Ann Arbor, MI 48105 USA