convincing squeak to use LF instead of CR as the line separator...

Greg A. Woods woods at weird.com
Sat May 26 19:21:22 UTC 2001


[ On Friday, May 25, 2001 at 11:23:31 (-0500), Lex Spoon wrote: ]
> Subject: Re: convincing squeak to use LF instead of CR as the line separator...
>
> I like the distinction, myself.  "text" is something that makes sense on
> every OS, but it's done slightly differently on each one.

Yes of course there's a distinction between "text" and any arbitrary
opaque binary format.  However the distinction is all in the reader's
view and it should not involve any conversion.  Text data can be treated
as opaque binary data just as easily and doing so should not imply any
conversion either.

On unix there's no underlying distinction made between text and binary
content, not in the file, nor in the kernel or system calls, and not
even in most of the library routines most commonly used to read and
write files.  IMO this is the way it should be, though of course it puts
the onus on the application to make the distinction and unfortunately it
seems common for non-unix applications to have the opposite view of
things (i.e. to expect the underlying system to make the distinction for
them).

>  If Squeak is
> writing text, it's very nice if it can do so in the format that's
> appropriate for the underlying platform.  Likewise if Squeak is
> *reading* text.

Indeed, thus my initial post to this thread.... :-)

What's most confusing in Squeak though is that it isn't internally
self-consistent.  I still haven't found the reason why my second
go-around at running the C translator produced some source files with LF
separators and the rest with CR separators....

> Already we have to do this when talking across the Internet.  Squeak
> can't impose it's text format on the Internet at large, and must cope
> with what the Internet is using.  It's the same way with files, IMO.

Wire-format protocols for communications, especially those that
encapsulate other forms of data (and especially any that encapsulate
opaque binary data) usually have specific definitions of their line
terminators (and yes here they're usually terminators, not separators).
Unfortunately most Internet protocols use the worst choice: CRLF.

The difference with files is that files should be stored in the host
system's native format.  Squeak doesn't do that in all cases, and that's
bad for the host systems which have different native formats (and only
slightly good for sharing files between squeak instances on different
host systems).

As much as I hate to say it I think Squeak could learn lots from Emacs
(and Plan 9).

> By the way, I'm fairly sure that text/binary distinctions on files are
> much older than MS-DOS.

It's "infinitely" older, at least if you consider "cards" to be text.  :-)

>  Blurring the distinction is something Unix
> does, but it isn't necessarily good -- it means that files don't have
> any kind of "class" at all, and are just dumb data.

On the contrary!  It's very good!  I think it's even necessary if you're
building small "tools" to manipulate data (i.e. following what some call
the "unix philosophy").  Why should a file transfer tool make any
distinction between "text" and "opaque binary" formats?  It should just
move the data, without ever changing it.

Unix isn't perfect, but most of its fault's are not in the lack of a
kernel-based file classification tool, but rather in the limiting
assumptions made by programmers writing the tools that run on top of
that kernel.  For example why should some versions of 'tr' (which an
translate characters in a stream of data, and which I use to "fix"
squeak files, for example) fail when they encounter a NUL character?
They shouldn't but they do because their authors assumed that they would
only ever handle text files.

>  The big advantage
> of the Unix system is that the kernel becomes simpler and smaller, which
> is important if your machine has only tens of kilobytes of memory, but
> isn't so important now.

If you want to appreciate a small and elegant kernel that does often use
"text" for even low-level systems stuff then you should look at Plan 9.

Indeed within Plan 9 are the fruits of lots of man-years of experience
with handling multi-lingual text too!

> Also, I'm fairly sure that on some OS's putting a file in "text" mode is
> a bigger deal than just translating line end characters.  For example,
> it might involve saving the file as a list of 80-column lines, padded
> with spaces at the end.

	:-)

> But that's history.  Most OS's understand text files, and Squeak gets
> confronted with "text" files all the time that were editted on foreign
> OS's.  It's not hard to support them, but it means that text vs. binary
> must mean more than characters versus bytes.

My primary concern is to find a way to convince Squeak to support my
native platform with its un-typed files -- then I'll deal with managing
"foreign" files after that....  Squeak already does OK with text over
network communications so it's probably just local files that are an
issue.

-- 
							Greg A. Woods

+1 416 218-0098      VE3TCP      <gwoods at acm.org>     <woods at robohack.ca>
Planix, Inc. <woods at planix.com>;   Secrets of the Weird <woods at weird.com>





More information about the Squeak-dev mailing list