Extending FileList with CrLf

Daniel Vainsencher danielv at netvision.net.il
Tue Jul 22 23:43:05 UTC 2003


So a lot of editors are taking the simplistic route. That's a reasonable
application level decision. 

Though it is the sort of decision that would make me prefer EMACS
because while it does read and write correctly all types line
conventions, it also explictly sets a mode and gives the user an
indication of what mode he's working in. It might very well internally
convert all those line endings to one and the same character, but I
don't care, because the application model is consistent, and let me as a
user know what's going on.

For an editor to expect any specific line ending convention (current
platform, Crs, Lfs, whatever), and misinterpret other conventions is
also reasonable, because it's clear what is happening.

For a system class to default to non-neutral mappings stinks, because
users will be discouraged from using text (since it is not cross
platform - see my practical problem), and applications are discouraged
from being aware of the encoding, and doing the right thing.

You both say that if I care about line ending conventions, the text is
not relevant and I should be using binary mode. So if I were fixing
Celeste to have a platform portable file format, that means I should not
be able to use the abstraction read-a-line, just because I want to use a
specific encoding? Maybe Stream>>nextPutAll: should refuse to process
Strings when I'm in binary mode, as well - after all string printing is
undefined in binary mode, right? to me this makes zero sense. The
decision binary/text dominates the decision "what encoding should I use
in my text", and this should be reflected by my using text mode, with
any encoding supported that I choose.

Everything you write about the fact that Squeak doesn't have a good,
encoding aware text editor is correct in itself, but doesn't IMO mean
that we should make text mode useless at the system level.

In short - we should have Streams that can be in binary mode or in text
mode. In text mode, they should be able to apply whatever
transformations the application requests it to. Streams should provide
only mechanism, and zero policy at all about encodings or
transformations.

Stream creation protocol could be changed to have #binaryFileNamed:
#crTextFileNamed: lfTextFileNamed: platformTextFileNamed: and maybe,
just maybe, autodetectEncodingOldFileNamed:. But oldFileNamed: should
not decide on its own to do mappings without the application having a
clue about it.

Daniel

Andreas Raab <andreas.raab at gmx.de> wrote:
> Daniel,
> 
> > Philosophically:
> > When you say that "text" automatically means one specific 
> > mode, in which in-the-image representations do not match in-the-file
> > representations, you are ignoring other possibilities. I should be
> > able to specify text behavior, and decide what mapping I want exactly,
> > but the default mapping should be the transparent one - the identity
> > mapping.
> 
> No. If you say "text file" to an average person (or programmer) on Windows
> it means "a file which I can open in Notepad and read and edit there". If
> you say "text file" to an average person (or programmer) on Linux it means
> "a file which I can open in vi/Emacs and read and edit there". And so on. I
> would even claim that for "text files" we should not only have CRLF
> conversions but also appropriate character mappings. If you have ever
> written any piece of text using umlauts or other non-ASCII characters in
> Squeak you know what I'm talking about. In fact, this is THE major reason
> why I don't do any non-Squeak related piece of text editing in Squeak and
> it's been the reason why I have "fixed" the clipboard primitives so that
> that they do character translation (which, naturally, messes up totally if
> you want to transfer binary contents to the clipboard). The fact that I have
> not seen a single complaint about the problem of "binary transfers" to the
> clipboard, comparing to regular complaints that people were unable to do
> reasonable copy/paste between Squeak and other apps should tell you
> something here.
> 
> > To me, CrLfFS as a default is only different in degree from 
> > having as a default a stream that draws the text into a large
> > rectangular bitmap, and writes it as a JPEG. Its effects may be
> > recoverable (assuming nobody puts strange artefacts into it),
> > but it loses information which may be important, and is an unobvious
> > choice of representation, which will overall add confusion, not reduce
> > it.
> 
> Wrong. "Loosing information" is just plain wrong here. What is "lost" so to
> speak is a character which represents no meaning in the Squeak universe
> anyway (LF). Within Squeak, line breaks are represented by CR (which is
> fine) so what the file stream does is a very straightforward, totally
> obvious _mapping_ of information. If you are concerned about the bytes of
> some file then you don't really want text - you want bytes! And that's
> binary and not text mode.
> 
> As for confusion, all I can say is: Open a text file that someone (not using
> Squeak)sent to you (wasn't it you who wanted to share with others? :) Or try
> reading a few lines of text from it. *That* is confusing if you don't know
> that Squeak really doesn't play by the rules that you've come to expect from
> any program which deals with "text" on your platform (the arguments about
> umlauts and non-ascii characters apply even more strongly here).
> 
> > BTW, I think we are using the word "transparent" with different
> > meanings. For me, something that has external effects (files
> > incompatible across platforms) is not truly transparent, even if it
> > seems so for a couple of months, or to someone very used to it.
> 
> "Files being compatible across platforms" always assumes binary mode as the
> interpretations and expectations among the platforms differ. For example,
> unless I am mistaken, OSX uses UTF8 as the default encoding for "text" (such
> as in the file system and probably "text" files itself). Duh. Secondly,
> given that in the particular case in question, determining the line end
> convention is _automatic_ it is totally obvious (=transparent) what you get
> - as a client you see CR, that's it. Everything else is up to
> CrLfFileStream. And if you want bytes you use binary mode (that's what it's
> for after all).
> 
> Cheers,
>   - Andreas



More information about the Squeak-dev mailing list