Extending FileList with CrLf

Lex Spoon lex at cc.gatech.edu
Sat Jul 26 05:17:04 UTC 2003


Wow, Ned, you have actually proposed a fairly thorough solution.  I
think you are proposing too much work, however.  In short, just making
CLFS the default for 3.7 looks like it would be fine for everyone's
purposes.  For the details, read on.



> My suggestion would extend this to handle damaged files (certainly not 
> the common case, I'd hope!) in a more sensible way. For instance, it 
> should be possible to read a *text* file that has CR/CR/LF between 
> every line as a bunch of lines separated by single (logical CR) 
> delimiters. And in this case it might make sense to *not* use 
> CR/CR/LF when re-writing, but to attempt to fix the file and use the 
> platform (or preferred) default. Which might be CR/LF in this case 
> because that is the last "accepted" sequence of delimiters in each 
> line.

Why should we support this?  The file is broken.  The current
interpretation in CLFS seems to be as good as any for such a file, and
if you want to write some code to fix such files, you should surely use binary
mode.

If this kind of thing actually happens in practice, then we can add even
more smarts to CLFS, but in the mean time it doesn't seem critical.



> - default file opens are *not* text unless you explicitly use a text 
> stream class, wrapper, or constructor method. You read one character 
> per character in the file.

Why not text mode as the default?  It's not a big deal, but note that:

	1. Text mode is the current default in Squeak.

	2. Beginners, I'm sure, tend to try text files as their first
experiments.



> - we review the reading and writing of text files in the Basic image 
> to make sure that the behavior is what we want. For instance, we may 
> decide to maintain the Mac delimiters in ChangeSet's file-out format, 
> but to make the "save as text" from the Workspaces save in the 
> default text format.

How about just make them all use platform line-endings?  It's a very
picky distinction to draw: "generic text file" versus "Squeak text
file".  Does the distinction matter?  The only reason I can think of to
have CR line endings on the filesystem, is to support legacy Squeak
installations without CLFS turned on.  That doesn't seem worth bothering
with; those guys are already annoyed by the CLFS people, anyway.  And
beside, Squeak is still at a stage where burning the disk packs is a
fine strategy.


> - we very carefully review the uses of #position and #position: on 
> text streams in the image, especially if there's math being done on 
> these values. 

Usually text files are read from beginning to end, or written from
scratch.  The times that positions are used in text files, they are
almost always merely saved and then restorted later.

The only time I've seen file position math in Squeak, for a text file,
is in #peek's default implementation.  That's been fixed for CLFS for a
long time now.



> 	- come up with a StreamPosition object that knows about line numbers 
> and offsets within a line

That would be cool, but it doesn't seem useful often enough to bake into
the standard classes.  Usually text files are just scanned from
beginning to end.  If they aren't, they are usually scanned starting
from some saved position, and ending after some criterion is noticed in
the text itself.  (e.g., you read one XML element from a file).

If you really want to get into this, then there are a number of great
ways to index into text files, including by paragraphs and words.  I
want to say that Hypercard is great with this?


> 	- make a #mark and #returnToMark: (whatever their names should be) 
> that can be used in the common cases where you just want to return to 
> where you were

These aren't needed: you can already use #position and #position: ,
unless I am missing something.


> 
> - we add either:
> 	- text flavors of the open calls
> 		FileStream readOnlyTextFileNamed: ...
> 	- a message that will change the stream behavior (or wrap the stream) 
> to text (this is my preference)
> 		someStream asText ...

Both of these look fine; they both insist or at least suggest that files
not change mode once they are opened, which is probably  a good thing. 
However, the current approach in Squeak also seems fine.




> - there should be a preference for the default delimiter flavor for 
> new text files. The choices should include "OS default" as well as 
> specific flavors (CR, LF, or CRLF probably).
> 

What do you mean by "preference"?  If it is per-file, I'll agree.  If it
is system-wide, then let's just leave it as always OS default and avoid
a useless preference.  The one oddball who wants to override the default
line ending for their image, can still manage to do so by modifying the
code that defines the OS default.


> - but it should be possible to get a Notification (whose default 
> handler just ignores it) when you encounter a different (unexpected) 
> line ending. Or at least to query the stream to see if such 
> unexpected delimiter sequences have been read.

Why is this important?  If you are using a "text file", then we could
insist that you do not care about such things.  If you do care, then
write your own darned convertor and open the file in binary mode.  :)

Can you think of an example?  It seems quite odd, logically.


> - the default write behavior for pre-existing text files should be to 
> use the auto-detected line endings (which could be as simple as 
> CrLfFileStream's search for the first delimiter).

Yeah.  Not that this will almost always be an append.  In fact, we could
*insist* that it be an append, if they want predictable results.




> - it should be possible to override the defaults and specify:
> 	- read delimiter translation mode (i.e. strict or liberal)
> 	- write delimiter sequences
> 	- behavior on encountering non-default delimiter sequences on read

Okay, but:
	
	- there is no "strict" reading code available; CLFS just does liberal. 
Is "strict" really necessary?  Why?

	- making the write delimiter tweakable would be easy

	- changing the behavior of unexpected delimiters depends on having
"strict" mode at all


IMO these aren't important features for us to get going, and we should
only bother with the second one for now.


Lex



More information about the Squeak-dev mailing list