"R. A. Harmon" harmonra@webname.com wrote:
I think with the following proposed set of conventions that PILT could be effectively achieved and the odd "bite" handled on a case by case basis in some standard way.
- Internally all lines end with a carriage return and contain no other interline spacing (carriage return, form feed, line feed, and
vertical tab). - External text lines end with the platform default unless explicitly set to some other line end. - External binary lines must deal explicitly with interline spacing characters. - External text is the open default. - All objects that have lines and internal stuff that depends on them (like Text string and runs) must have conversion methods for internal to external and back if used made external. - You run into something that doesn't follow the convention, send in a fix or at least point it out.
This sounds basically like what CrLfFileStream does, if you set it as the default concreteStream. For binary files, it does nothing: you're on your own. For "ascii" files, it converts CR, LF, and CRLF into CR on input, and it saves output according to some consistent external convention. The overall effect is that for ascii-mode files, internal strings have CR's, and external files have whatever the platform convention is.
UNLESS, the user is dealing with file positions. If the external file has CRLF's in it, then a single character internally can become two characters externally. So what should happen to "the" file position?
One answer, the way it is done right now, is to let "the position" jump by more than 1. This is basically what ANSI C does with text files. Another answer, would be to calculate a "virtual position" and map that back and forth to the actual file. This can require reading the entire beginnig of a file, though, and sucks for large files. It's very clean though.
Or yet another answer, is to simply disable messing with file positions for ascii files. If you really care about file positions, then maybe you are dealing with a binary file that happens to contain text?
The second is upwardly compatible from the third solution. Once you start down the path of positions not making sense, you get stuck with it. So basically it's a chaice between the first way (easy, but with lower semantics) and the second+third way (hard, but you shouldn't be doing it), and then a choice as to whether saying you can't do it is acceptible.
So in my opinion, the only things stopping adoption of CrLfFileStream as a default are:
1. a decision on the file positions issue (I vote for "it's illegal", migrating towards "it's expensive and unadvised") 2. automatically choosing output line endings based on what the current platform is. This could be done by putting a "CrLfFileStream guessLineEndConvention" in SystemDictionary.processStartupList.
Oh, and maybe it could be renamed, too. CrLf isn't very descriptive, especially on a Unix machine that doesn't have any CrLf delimitted files. "TextFileStream", maybe?
Lex
lex@cc.gatech.edu wrote: ....
UNLESS, the user is dealing with file positions. If the external file has CRLF's in it, then a single character internally can become two characters externally. So what should happen to "the" file position?
One answer, the way it is done right now, is to let "the position" jump by more than 1. This is basically what ANSI C does with text files. Another answer, would be to calculate a "virtual position" and map that back and forth to the actual file. This can require reading the entire beginnig of a file, though, and sucks for large files. It's very clean though.
Or yet another answer, is to simply disable messing with file positions for ascii files. If you really care about file positions, then maybe you are dealing with a binary file that happens to contain text?
I'd support the first alternative with the reasoning that file positions for ASCII files should be treated as opaque 'cookies', that is, you can get the file position and set it to get back to a point where you were before, but you should not do arithmetic with them. Everything else makes dealing with text files painful. Either you pay by waiting a long time for the mapping calculation, or you pay by inventing your own (probably buggy) file position handling.
Of course, the numbers that you remember and use to jump to a position in a file should not be kept when the file could possibly be manipulated by an outside entity, for example by some file transfer program that tries to be 'helpful' and inserts or deletes some 'bogus control characters'. Sadly, the changes file looks a lot like a text file although it is really a stupid database which must be treated as binary data...
Hans-Martin
I'd support the first alternative with the reasoning that file positions for ASCII files should be treated as opaque 'cookies', that is, you can get the file position and set it to get back to a point where you were before, but you should not do arithmetic with them.
[other less pleasurable options deleted]
You have to do something like this, anyway, to support multi-byte characters, so you may as well do lineEndConvention this way, as well.
-- Mike
GO unicode!
squeak-dev@lists.squeakfoundation.org