CrLfFileStream as default?

R. A. Harmon harmonra at webname.com
Sat Oct 31 15:47:34 UTC 1998


At 04:23 PM 10/29/98 -0500, Lex wrote:
>"R. A. Harmon" <harmonra at webname.com> wrote:
>
>> I think with the following proposed set of conventions that PILT could be
>> effectively achieved and the odd "bite" handled on a case by case basis in
>> some standard way.
[snip]
>
>
>This sounds basically like what CrLfFileStream does
[snip]
>So in my opinion, the only things stopping adoption of CrLfFileStream as a
default are:
>
>	1. a decision on the file positions issue (I vote for "it's illegal",
>          migrating towards "it's expensive and unadvised")
>	2. automatically choosing output line endings based on what the current
>          platform is.  This could be done by putting a "CrLfFileStream >
guessLineEndConvention" in SystemDictionary.processStartupList.
[snip]

I don't think there is any reason for "guessLineEndConvention" in the
approach I propose and if it guesses wrong (especially on an already
anomalous file), CrLfFileStream, seems to produce anomalies that I don't
think are caused just by cut-and-paste.

The native platform line termination conventions I know of are as follows:

	DOS/Windows on x86		CrLf
	UNIX				  Lf
	Mac				Cr

So when reading external text, all interline spacing (carriage return, form
feed, line feed, and vertical tab) characters are handled as follows: 

  Cr        - add Cr to internal collection, if followed by Lf then read and
ignore Lf. 
  Lf        - if proceeded by Cr then ignore Lf else add Cr to internal
collection.
  Ff and VT - ignore.

This does require reading external text a character at a time, but doesn't
seem prohibitively expensive.

CrLfFileStream also doesn't deal with the problems like that of runs
breaking in Text instances.  It seemed to me that a lot of this kind of
anomaly exists when just about any of the classes start reading and writing
to external devices -- ports, disks, etc.

I didn't really dig into the code to see for certain, so I could be wrong.
The more I looked, the more things looked a little shaky in a number of classes.

At 03:15 PM 10/30/98 -0800, Michael S. Klein wrote:
>>At 03:15 PM 10/30/98 -0800, Hans-Martin Mosner wrote:
>> I'd support the first alternative with the reasoning that file positions
>> for ASCII files should be treated as opaque 'cookies', that is, you can
>> get the file position and set it to get back to a point where you were
>> before, but you should not do arithmetic with them.
>
>[other less pleasurable options deleted]
>
>You have to do something like this, anyway, to support multi-byte characters,
>so you may as well do lineEndConvention this way, as well.

I've also been thinking about how to handle multi-byte character support.  I
agree with Hans-Martin Mosner and Michael S. Klein.

--
Richard A. Harmon          "The only good zombie is a dead zombie"
harmonra at webname.com           E. G. McCarthy





More information about the Squeak-dev mailing list