Extending FileList with CrLf

Richard A. O'Keefe ok at cs.otago.ac.nz
Mon Jul 28 01:06:14 UTC 2003


Ned Konz <ned at bike-nomad.com> wrote:
	But that distinction and potential confusion would need to be 
	addressed as well.
	
	There would be three modes (I don't know the right names for the last 
	two):
	
	* binary: byte for byte, using SmallIntegers in the 0..255 range or 
	ByteArrays (as it is now using #binary)
	
!!	* non-translating text (this should be the default as it is now, I 
!!	think): same as above (byte for byte), but you're dealing with 
!!	Characters and Strings instead. This corresponds to the use of #ascii 
!!	with the (non-CrLfFileStream) standard streams.
	
	* translating text: Characters and Strings, but with potential 
	conversion of line endings on input and output.
	
If you've ever tried to teach Smalltalk to a class of Computer Science
students who go away and try your examples on their Linux or Windows
boxes, Squeak's out-of-the-box insistence on CR as line terminator will
have given you (and them) more grief than you would like.

We want people who start using Squeak to ENJOY it.
If they can't read from a simple text file without problems,
it puts them off.

The *simplest* way to open a file for input,
whatever that ends up being,
 - should open the file read-only
 - should map CR, CRLF, and LF all to CR
without having to be told.  (And Filelist should use this method for
opening files that it displays to you.)
The *simplest* way to open a file for output or appending,
whatever that ends up being,
 - should use the host platform line end convention
without having to be told.

ANSI Smalltalk has a rather unhappy hybrid of "translating text" and
"non-translating text".  In <gettableStream>, #next returns whatever
is there (untranslated text), while #nextLine reads up to "an
implementation defined end-of-line sequence, and perhaps CR, LF, and
CRLF could each be an implementatin-defined end-of-line sequence (although
one sentence seems to assume that there's only one).

<FileStream> has
    #'text'    The external data is (sic.) treaded as a sequenced (sic.)
               of 8-bit characters encoded using an implementation defined
               external character set. (Which does not appear to allow
               mapping a sequence of external characters to a single
               internal character.)  The sequence value type is <Character>
               restricted to those specific characters that may be
               represented in the ecternal character set.

One looks to the standard for clear guidance on stuff like this (it was an
obvious problem back in the late 1970s, when Prolog designers first had to
consider it).  One fails to find it.

I'm not saying that "dumb text" should not be available somehow,
only that to encourage beginners, it must NOT be the normal obvious
way to access text files.



More information about the Squeak-dev mailing list