Extending FileList with CrLf

Andreas Raab andreas.raab at gmx.de
Wed Jul 23 00:41:52 UTC 2003


Hi Daniel,

> Though it is the sort of decision that would make me prefer EMACS
> because while it does read and write correctly all types line
> conventions, it also explictly sets a mode and gives the user an
> indication of what mode he's working in. It might very well internally
> convert all those line endings to one and the same character, but I
> don't care, because the application model is consistent, and 
> let me as a user know what's going on.

That's a reasonable argument under some circumstances. For example, I like
Emacs because of its handling of line end conventions too, yet the only
place where I ever care about what "mode" it's in is when I have to check in
code into CVS (which sucks at handling text). I should also note that Emacs
- unless you are being very specific - uses the platform defaults. How often
have you changed this in the past?

However, I can be convinced by the consistency argument, e.g., once  the
stream decided on a line end convention it should stick to it. Not hard to
do, btw, you would only loose some of the "robustness" Ned was talking
about. Oh, and modes - you are aware that you can ask a file stream for the
line end convention, are you?

> For a system class to default to non-neutral mappings stinks, because
> users will be discouraged from using text (since it is not cross
> platform - see my practical problem), and applications are discouraged
> from being aware of the encoding, and doing the right thing.

It doesn't stink at all. Again: CrLfFileStream _reading_ is gives you CRs no
matter what you do. If Celeste doesn't handle it correctly then it hasn't
been written with text files in mind but rather databases containing strings
(which is even indicated by calling it a "data" base not a "text" file). As
for applications - they will by default do what is considered the "right
thing" on the platform they are running on.

We are doing this in many, many cases already. All of the code which uses
primitive abstractions is written "without being aware of the platform" and
that's exactly what we want! We need a text abstraction here, something that
says "in Squeak a line end is represented by X and you don't have to care".
If you replace X with Cr then you get CrLfFileStream.

> You both say that if I care about line ending conventions, the text is
> not relevant and I should be using binary mode. So if I were fixing
> Celeste to have a platform portable file format, that means I 
> should not be able to use the abstraction read-a-line, just
> because I want to use a specific encoding?

Yes and no. If there is no support for the encoding you want to use, then
you can't use it. Consider I would want to use the string <LINEBREAK> as a
line end delimiter. Could I use read-a-line? No. 

However, given that CrLfFileStream _does_ support your (not-so) neutral
encoding, sure go ahead, use it. Just tell CrLfFileStream to use that
encoding instead of the default one.

> Maybe Stream>>nextPutAll:
> should refuse to process Strings when I'm in binary mode, as well
> - after all string printing is undefined in binary mode, right?

Given that "(ByteArray new: 100) writeStream nextPutAll: 'Hello'" raises an
error, yes this argument can be made. Not sure how useful it is though (in
both cases).

> to me this makes zero sense. The decision binary/text dominates
> the decision "what encoding should I use in my text",

Oh, really? ;-) The only difference between binary and text we have today is
nullified by adding #asByteArray or #asString. So the two modes just do a
bit of optimization so that you don't have to send these messages. Given
this, what "decision" is there to make? All the decision says is "do I want
to get ByteArrays or Strings" and in many case we convert these explicitly
afterwards. The decision between bytes and strings can even be made by the
client if the client uses #nextInto:. So, in short that "decision" is really
non-existing. There is no text-mode for files in Squeak, just a bit of
convenience not having to send #asByteArray or #asString.

> and this should be reflected by my using text mode, with
> any encoding supported that I choose.

Sure. Noone is arguing against using the encoding you choose. What we're
arguing _for_ is to have a platform-compatible default encoding. That's what
CrLfStream gives us for line endings.

> Everything you write about the fact that Squeak doesn't have a good,
> encoding aware text editor is correct in itself, but doesn't IMO mean
> that we should make text mode useless at the system level.

Again, I want to make it use _ful_ not useless. I don't know where you read
this from my post but my argument is that (as you can see from the above) we
don't even have a useful interpretation of what "text" means. CrLfFileStream
is one tiny little step for trying to actually get some meaning into the
words "text file".

> In short - we should have Streams that can be in binary mode 
> or in text mode.

Yes.

> In text mode, they should be able to apply whatever
> transformations the application requests it to.

Yes. With the default being a sensible platform compatible encoding.
CrLfFileStream does this for line endings.

> Streams should provide only mechanism, and zero policy
> at all about encodings or transformations.

Streams should provide default policies which can be easily changed.

> Stream creation protocol could be changed to have #binaryFileNamed:
> #crTextFileNamed: lfTextFileNamed: platformTextFileNamed: and maybe,
> just maybe, autodetectEncodingOldFileNamed:. But oldFileNamed: should
> not decide on its own to do mappings without the application having a
> clue about it.

It should use the default encoding.

Cheers,
  - Andreas



More information about the Squeak-dev mailing list