Extending FileList with CrLf

Andreas Raab andreas.raab at gmx.de
Tue Jul 22 22:59:08 UTC 2003


Daniel,

> Philosophically:
> When you say that "text" automatically means one specific 
> mode, in which in-the-image representations do not match in-the-file
> representations, you are ignoring other possibilities. I should be
> able to specify text behavior, and decide what mapping I want exactly,
> but the default mapping should be the transparent one - the identity
> mapping.

No. If you say "text file" to an average person (or programmer) on Windows
it means "a file which I can open in Notepad and read and edit there". If
you say "text file" to an average person (or programmer) on Linux it means
"a file which I can open in vi/Emacs and read and edit there". And so on. I
would even claim that for "text files" we should not only have CRLF
conversions but also appropriate character mappings. If you have ever
written any piece of text using umlauts or other non-ASCII characters in
Squeak you know what I'm talking about. In fact, this is THE major reason
why I don't do any non-Squeak related piece of text editing in Squeak and
it's been the reason why I have "fixed" the clipboard primitives so that
that they do character translation (which, naturally, messes up totally if
you want to transfer binary contents to the clipboard). The fact that I have
not seen a single complaint about the problem of "binary transfers" to the
clipboard, comparing to regular complaints that people were unable to do
reasonable copy/paste between Squeak and other apps should tell you
something here.

> To me, CrLfFS as a default is only different in degree from 
> having as a default a stream that draws the text into a large
> rectangular bitmap, and writes it as a JPEG. Its effects may be
> recoverable (assuming nobody puts strange artefacts into it),
> but it loses information which may be important, and is an unobvious
> choice of representation, which will overall add confusion, not reduce
> it.

Wrong. "Loosing information" is just plain wrong here. What is "lost" so to
speak is a character which represents no meaning in the Squeak universe
anyway (LF). Within Squeak, line breaks are represented by CR (which is
fine) so what the file stream does is a very straightforward, totally
obvious _mapping_ of information. If you are concerned about the bytes of
some file then you don't really want text - you want bytes! And that's
binary and not text mode.

As for confusion, all I can say is: Open a text file that someone (not using
Squeak)sent to you (wasn't it you who wanted to share with others? :) Or try
reading a few lines of text from it. *That* is confusing if you don't know
that Squeak really doesn't play by the rules that you've come to expect from
any program which deals with "text" on your platform (the arguments about
umlauts and non-ascii characters apply even more strongly here).

> BTW, I think we are using the word "transparent" with different
> meanings. For me, something that has external effects (files
> incompatible across platforms) is not truly transparent, even if it
> seems so for a couple of months, or to someone very used to it.

"Files being compatible across platforms" always assumes binary mode as the
interpretations and expectations among the platforms differ. For example,
unless I am mistaken, OSX uses UTF8 as the default encoding for "text" (such
as in the file system and probably "text" files itself). Duh. Secondly,
given that in the particular case in question, determining the line end
convention is _automatic_ it is totally obvious (=transparent) what you get
- as a client you see CR, that's it. Everything else is up to
CrLfFileStream. And if you want bytes you use binary mode (that's what it's
for after all).

Cheers,
  - Andreas



More information about the Squeak-dev mailing list