Extending FileList with CrLf

Lex Spoon lex at cc.gatech.edu
Wed Jul 23 17:06:33 UTC 2003


> I used to have a Celeste DB. I work in Linux most days. I go to ESUG
> with Windows laptop. On it I start another DB. I come back and try to
> merge. Blech, my files are not bit compatible because they happened to
> have been created on different OSes... And because my images used CrLfFS
> by default, I wasn't even aware of this.
> 

Actually, Celeste is fine with mixed line endings, provided you have
CrLfFileStream turned on in your image.  The only problem is if you use
CrLfFileStream for a while and then turn it off.  This is a classic
example of CrLf people and non-CrLf people annoying each other, with the
twist both people are the same this time.  :) 

I actually like that Celeste uses the platform line endings, so that
more(1) and less(1) can view the files.



> For a system class to default to non-neutral mappings stinks, because
> users will be discouraged from using text (since it is not cross
> platform - see my practical problem), and applications are discouraged
> from being aware of the encoding, and doing the right thing.
> 

If CrLfFileStream were the default, then it *would* be cross-platform.  
 That's why I want it to be a default, instead of a user-specific
preference.  It does stink that all you non-CrLf weenies can't read my
text files, and that I have to read verbiage about how I should gzip
files so that the precious CR line endings don't get converted to
something I can read in vi.  :)



>  The
> decision binary/text dominates the decision "what encoding should I use
> in my text", and this should be reflected by my using text mode, with
> any encoding supported that I choose.

I would say that strings vs. bytes is a different issue from binary mode
versus text mode.  We could debate that forever, but it's at least
reasonable.  Squeak is providing a nice default encoding, but you can
easily use #asString or #asByteArray.

Actually, it would make sense if either encoding were possible in either
mode.  This can be as simple as having nextPut: and nextPutAll: tolerate
either encoding, and adding methods like nextChars: and nextByte which
specifically request an econding.


> > Then you are arguing to use binary as default, or even better, _no_ 
> > default at all, and a explicit definition of how you want to access the 
> > file, sounds reasonable.
> Yes.

I'd be reasonably okay with using CrLfFS as the default but putting it
in binary mode.  If you send #text to the stream (or some in-the-future
#asTextFileStream), then it switch to auto-conversions.

The main problem is that for beginner programs, text is the most common
thing to use.  An expert who is trying to write a BMP or read an
ImageSegment will very quickly figure out to put it in binary mode, but
a novice will be confused when they open a file but can't do nextPutAll:
'hello, world' , or if they read from their file and see a bunch of
bytes.  This pain could be ameliorated if necessary, though, as
described above.

A secondary issue is that Squeak currently defaults the other way, so a
ton of code will have to be tweaked if we switch to binary as the
default.

A third issue might be compatibility with other  Smalltalk's.  What is
the usual default?  I'm guessing it is text mode.

I don't think you can distinguish the defaults by which one causes more
bugs.  If someone is using a Mac and sets the mode wrong, they can
overlook the bug in both directions.  It's just as bad to open
/etc/hosts and see it all on one line, as it is for your GIF file to be
garbled.  Basically, you really need to use the file in the correct
mode, to get correct behavior.


Lex



More information about the Squeak-dev mailing list