"R. A. Harmon" harmonra@webname.com wrote:
This sounds basically like what CrLfFileStream does
[snip]
So in my opinion, the only things stopping adoption of CrLfFileStream as a
default are:
- a decision on the file positions issue (I vote for "it's illegal", migrating towards "it's expensive and unadvised")
- automatically choosing output line endings based on what the current platform is. This could be done by putting a "CrLfFileStream >
guessLineEndConvention" in SystemDictionary.processStartupList. [snip]
I don't think there is any reason for "guessLineEndConvention" in the approach I propose and if it guesses wrong (especially on an already anomalous file), CrLfFileStream, seems to produce anomalies that I don't think are caused just by cut-and-paste.
The purpose of this method is to pick a convention for *new* files. The idea being if you create a text file on Windows, it should have CRLF line endings, and if you create a file on Unix, it should have LF line endings. That way you can view Squeak files using other applications on your operating system, without having to convert the files first.
If you don't do this method during startup, then CrLfFileStream won't notice it is operating on a new platform. It will continue using whatever convention it was using when the image was saved, even if it was saved on a different platform.
Now, there's a second "guess" going on in CrLfFileStream, and that's when a specific file is opened. This guess is to ensure that new data written to the file will have the same convention as the data that's already in the file. If you have a CRLF-delimitted file on Unix, then you should keep writing CRLF endings, and not start appeding lines with LF endings. The point is debatable, I suppose, but that's the purpose of this one.
Now in the CrLfFileStream in the standard image, this second guess is also used when reading. If the convention was guessed to be CR, then any LF's read in will be left as is. This is also debatable, and in fact I think that LF's should be converted to CR's no matter what convention has been guessed.
So I sent a patch around a few days ago that did just this.
Now, I've been using this setup for a week or so now with no troubles. However, I've not messed with any *really* strange files....
Lex
lex@cc.gatech.edu wrote:
[snipped good description of the distinction between the guessing used to pick line ending that should be used for new files, and that for appending to old files]
Now in the CrLfFileStream in the standard image, this second guess is also used when reading. If the convention was guessed to be CR, then any LF's read in will be left as is. This is also debatable, and in fact I think that LF's should be converted to CR's no matter what convention has been guessed.
So I sent a patch around a few days ago that did just this.
Now, I've been using this setup for a week or so now with no troubles. However, I've not messed with any *really* strange files....
I checked out the patch for #next and it has a problem of getting stuck when the *last* character of the file is a Cr. The position gets bumped back, and you keep getting the Cr over and over again, never reaching the end.
The following is one way to fix it:
CrLfFileStream >> next | char secondChar | char _ super next. self isBinary ifTrue: [^char]. char == Cr ifTrue: [secondChar _ super next. secondChar ifNotNil: [secondChar == Lf ifFalse: [self skip: -1]]. ^Cr]. char == Lf ifTrue: [^Cr]. ^char
Personally, I haven't decided whether to start using this method or to stick with the (current) approach of expecting consistent line endings within a file being read. I do use something like the above, but only on an as-needed basis to fix-up "problem" files.
I can see cases where each of the two approaches (accepting mixed line endings vs. expecting only one type) can each have their uses. Maybe someone can figure out an approach that could allow either in a flexible yet clean implementation. Anyone up for refactoring the Stream hierarchy? :-)
------------------------------------------- Bill Dargel wdargel@shoshana.com Shoshana Technologies 100 West Joy Road, Ann Arbor, MI 48105 USA
squeak-dev@lists.squeakfoundation.org