CrLfFileStream as default?

David N. Smith (IBM) dnsmith at watson.ibm.com
Sat Nov 7 22:22:52 UTC 1998


At 17:38 -0500 9/24/98, Lex Spoon wrote:
>If a file has a CRLF pair in it, and CrLfFileStream converts that to a CR,
>then the idea of "position" gets messed up.  It looks like 1 character,
>but "file position" will act like 2 characters have gone by.  For
>CrLfFileStream to go the final mile, it should probably have some code to
>make positions truly transparent to the user.
>
>Other than that, CrLfFileStream seems very nice.  I've been using it as
>the default for several months now and found it nothing but convenient.
>
>
>Lex

I've just discovered this thread and read it quickly. The root seems a good
a spot as any as a base a few comments.

I'm wearing my Here's-What-Other-System-Do hat.

IBM Smalltalk has a different design which I've kind of liked. It gets
around the problems of having explicit classes for CrLf streams or of
guesing.

There are line delimiter values in the same pool dictionary that holds
character definitions:

   MACLineDelimiter     There's no Mac version but you might get Mac files.
   PMLineDelimiter      OS/2
   UNIXLineDelimiter    UNIX (tm)
   WINLineDelimiter     Windows
   LineDelimiter        The appropriate one of the above for this platform

Streams and file streams all have a #lineDelimiter: method that lets the
value be changed (and a #lineDelimiter method too). By default, the value
is that of the current host platform. Line delimiters are strings.

So, if I open a file on Windows, I get CrLf by default. I can do a
#nextLine which internally does a (self #upToAll: lineDelimiter) to read a
line. A #cr puts the stream's current line delimiter value.

For example, here is some (untested) code:

   | in out |
   in := CfsReadFileStream open: 'UnixFile'.
   in lineDelimiter: UNIXLineDelimiter.

   out := CfsWriteFileStream open: 'WinFile'.
   out lineDelimiter: WINLineDelimiter.

   [ in atEnd ]
      whileFalse: [
         out nextPutAll: in nextLine;
             cr ].
   in close.
   out close.

This reads a file in UNIX format and writes one in Windows format,
regardless of what platform it is run on. Omit the 5th line of code and it
converts a UNIX format file to whatever the native platform format is.

There is a general mechanism for calling code at image startup (and at
image shutdown) and at other times. With such a mechanism it is easy to set
the defaults, and it is an easy mechanism to build (if there isn't already
one in Squeak that I've missed).

There is one flaw, though: if you read the file character by character then
you get whatever is in the file. There is no attempt to map, say CrLf, to a
native Cr. One usually avoids this by reading lines but that is not always
what you want. I suspect that it'd be easy to add a #mapToLineDelimiter:
method which specifies another line delimiter for a mapping from the
stream's line delimiter but then you'd have the length of the read text
differ from the stream position.

Dave

_______________________________
David N. Smith
IBM T J Watson Research Center
Hawthorne, NY
_______________________________
Any opinions or recommendations
herein are those of the author
and not of his employer.





More information about the Squeak-dev mailing list