Extending FileList with CrLf

Ned Konz ned at bike-nomad.com
Sat Jul 26 16:10:16 UTC 2003


On Friday 25 July 2003 10:17 pm, Lex Spoon wrote:
> Wow, Ned, you have actually proposed a fairly thorough solution.  I
> think you are proposing too much work, however.  In short, just
> making CLFS the default for 3.7 looks like it would be fine for
> everyone's purposes.  For the details, read on.

Will do.

> Why should we support this?  The file is broken.  The current
> interpretation in CLFS seems to be as good as any for such a file,
> and if you want to write some code to fix such files, you should
> surely use binary mode.
>
> If this kind of thing actually happens in practice, then we can add
> even more smarts to CLFS, but in the mean time it doesn't seem
> critical.

I guess you're right.

> > - default file opens are *not* text unless you explicitly use a
> > text stream class, wrapper, or constructor method. You read one
> > character per character in the file.
>
> Why not text mode as the default?  It's not a big deal, but note
> that:
>
> 	1. Text mode is the current default in Squeak.you have a 1:1 
correspondence between logical and physical bytes.

Not in the sense I was using. I was using the same distinction used by 
the C libraries. That is, that "text mode" is the one in which 
newlines may get translated.

The current distinction in Squeak between "binary" and "ascii" is not 
that distinction: it's just a choice of using ByteArrays vs. Strings, 
and is not what I'm talking about.

But that distinction and potential confusion would need to be 
addressed as well.

There would be three modes (I don't know the right names for the last 
two):

* binary: byte for byte, using SmallIntegers in the 0..255 range or 
ByteArrays (as it is now using #binary)

* non-translating text (this should be the default as it is now, I 
think): same as above (byte for byte), but you're dealing with 
Characters and Strings instead. This corresponds to the use of #ascii 
with the (non-CrLfFileStream) standard streams.

* translating text: Characters and Strings, but with potential 
conversion of line endings on input and output.

>> - we review the reading and writing of text files in the Basic 
image 
>> to make sure that the behavior is what we want. For instance, we 
may 
>> decide to maintain the Mac delimiters in ChangeSet's file-out 
format, 
>> but to make the "save as text" from the Workspaces save in the 
> default text format.
>
> How about just make them all use platform line-endings?  It's a
> very picky distinction to draw: "generic text file" versus "Squeak
> text file".  Does the distinction matter?  The only reason I can
> think of to have CR line endings on the filesystem, is to support
> legacy Squeak installations without CLFS turned on.  That doesn't
> seem worth bothering with; those guys are already annoyed by the
> CLFS people, anyway.  And beside, Squeak is still at a stage where
> burning the disk packs is a fine strategy.

That's fine as long as we make sure that where backwards compatibility 
is important, like when the change sets are included in a project 
file, and maybe also on mailing change sets to the list.

> Usually text files are read from beginning to end, or written from
> scratch.  The times that positions are used in text files, they are
> almost always merely saved and then restorted later.

But that can be a problem if those positions get saved and then line 
endings are changed; this will invalidate those stored positions.

Doesn't Celeste save positions in its index file?

> What do you mean by "preference"?  If it is per-file, I'll agree. 
> If it is system-wide, then let's just leave it as always OS default
> and avoid a useless preference.  The one oddball who wants to
> override the default line ending for their image, can still manage
> to do so by modifying the code that defines the OS default.

I had been suggesting system-wide. However, if there's easy ways to 
convert files or to choose file-out formats then this probably isn't 
necessary.

> > - but it should be possible to get a Notification (whose default
> > handler just ignores it) when you encounter a different
> > (unexpected) line ending. Or at least to query the stream to see
> > if such unexpected delimiter sequences have been read.
>
> Why is this important?  If you are using a "text file", then we
> could insist that you do not care about such things.  If you do
> care, then write your own darned convertor and open the file in
> binary mode.  :)
>
> Can you think of an example?  It seems quite odd, logically.

Daniel was concerned about being too clever on reading (with my 
suggestion for reading possibly damaged files). I was suggesting the 
Notification to make it possible for applications that cared to 
detect these problems without having to duplicate the code to detect 
them.

> > - the default write behavior for pre-existing text files should
> > be to use the auto-detected line endings (which could be as
> > simple as CrLfFileStream's search for the first delimiter).
>
> Yeah.  Not that this will almost always be an append.  In fact, we
> could *insist* that it be an append, if they want predictable
> results.
>
> > - it should be possible to override the defaults and specify:
> > 	- read delimiter translation mode (i.e. strict or liberal)
> > 	- write delimiter sequences
> > 	- behavior on encountering non-default delimiter sequences on
> > read
>
> Okay, but:
>
> 	- there is no "strict" reading code available; CLFS just does
> liberal. Is "strict" really necessary?  Why?

I'd been suggesting a more liberal technique than CLFS.


-- 
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE



More information about the Squeak-dev mailing list