CrLfFileStream as default?

R. A. Harmon harmonra at webname.com
Mon Nov 2 16:04:46 UTC 1998


At 03:32 PM 10/31/98 -0500, Lex wrote:
>"R. A. Harmon" <harmonra at webname.com> wrote:
[snip]
>> I don't think there is any reason for "guessLineEndConvention" in the
>> approach I propose and if it guesses wrong (especially on an already
>> anomalous file), CrLfFileStream, seems to produce anomalies that I don't
>> think are caused just by cut-and-paste.
>> 
>
>The purpose of this method is to pick a convention for *new* files.
> The idea being if you create a text file on Windows, it should have
>CRLF line endings, and if you create a file on Unix, it should have
>LF line endings.  That way you can view Squeak files using other
>applications on your operating system, without having to convert the files
first.
>If you don't do this method during startup, then CrLfFileStream won't notice
> it is operating on a new platform.  It will continue using whatever convention
>it was using when the image was saved, even if it was saved on a different
platform.

Yes, I agree a default should be set at start up.  I think all that is
needed is a defaultLineTermString class variable set at start up for the
platform it's running on, and a stream instance lineTermString that is set
to the default and can be reset to something else manually if you want.
This is how Smalltalk Express (SE) does it, if I remember correctly.  I
assume Squeak can determine at start up what platform it's on, or should be
changed to do so, because it's useful in other areas also.

I think the CrLfFileStream approach to append applies only to files, and not
streams which my be ports, sockets, or something equally as exotic (means I
don't understand it).


>Now, there's a second "guess" going on in CrLfFileStream, and that's when a
>specific file is opened.  This guess is to ensure that new data written to
>the file will have the same convention as the data that's already in the file.

I think this will produce a mixed convention file if it gets a mixed
convention file.


>If you have a CRLF-delimitted file on Unix, then you should keep writing CRLF
>endings, and not start appeding lines with LF endings.  The point is debatable,
>I suppose, but that's the purpose of this one.
[snip]

I can see where this would be quite useful, especially for someone working
in both UNIX and Windows (dual boot).  I propose it be an option that one
can select as the default append, new, or both behavior.

I think the convention I propose works in all the following cases:

        - Writing a new file, Crs are appropriately transformed
          to Cr, CrLf, or Lf according to lineTermString.

        - Append an exiting file, Crs are appropriately transformed
          to Cr, CrLf, or Lf according to the lineTermString.

        - Reading an exiting file, Cr, CrLf, Lf, or a mixture of these,
          are appropriately transformed internally to Crs.

On Windows, some applications gracefully handle Cr, CrLf, Lf, or a mixture
of these, while others, make it difficult to work with any file that doesn't
conform to its line termination convention.  A mixture of conventions makes
no difference to the later, so I would prefer a Squeak default that produced
mixed convention on append.  This approach doesn't require exceptional
handling for append.


>So I sent a patch around a few days ago that did just this.

Yes, I saw it.


>Now, I've been using this setup for a week or so now with no troubles.
>  However, I've not messed with any *really* strange files....

I'm not sure why I did.  Somebody else said they were getting strange stuff
too.  Have you tried cut-and-paste operations.  I think these might confuse
CrLfFileStream.  I think they are valid operations that the fix to line
termination should handle.

I appreciate that you did CrLfFileStream.  I would have probably spent a
fair amount of time trying out the same idea, but not doing it as well.


At 03:28 PM 10/31/98 -0800, Michael S. Klein wrote:
>> I don't think there is any reason for "guessLineEndConvention" in the
>> approach I propose and if it guesses wrong (especially on an already
>> anomalous file), CrLfFileStream, seems to produce anomalies that I don't
>> think are caused just by cut-and-paste.
>
>Sometimes you may want to guess, sometimes you may want a rigid line end
>policy.

After some reflection, I agree (see above).


>> The native platform line termination conventions I know of are as follows:
>> 
>> 	DOS/Windows on x86		CrLf
>> 	UNIX				  Lf
>> 	Mac				Cr
>
>Smalltalks use cr.  There is also Unicode which has explicitly different
>line separators and paragraph separators ( U+2028 and U+2029 ).
[snip]

Adding Unicode (same as double-byte characters?) will require overriding
some behavior I suspect.  The line termination will be one of them.  I don't
no enough to join in that conversation.


>This works sometimes, but there are some of us who actually use ff & vt's
>placed in text by other people.

My question is what do the Ff and Vt characters in Text instances mean?  If
it means line termination, I suggest the code that uses them be changed to
reflect the convention if adopted.

If they are used as for some other purpose, say optimization because of the
sorting order (simple-minded example), I think that is one of those
case-by-case exceptions.  If the Text instance has Ff put in, something done
with the instance, then it's thrown away then that seems reasonable and
won't break any code.  If the instance is used by other code then this
additional Ff behavior instance should probably be in a new subclass of Text
(or new wrapper class, etc.), and conversation methods added.  External
strings terminated with a binary zero are good examples of why one might
want a string like object that breaks the convention.  SE has conversation
methods to add and remove the trailing zero.

I envision the line termination convention not as coercive, but as
liberating us a little bit.  One doesn't need to code for all the odd cases
everywhere.  I think this is especially important in a cooperative endeavor
like Squeak.  That's why I add the last rule:

        - You run into something that doesn't follow the convention, send in
a fix
          or at least point it out.

I think we need to backstop each other like baseball players do.  One player
will get behind another play trying to catch a ball, so if he misses it the
second player is there to help.  Someone might not know of the convention,
or ported code from another platform and didn't get around to changing to
convention (it happen to me, anyway).  I'd appreciate any back-stopping I
could get.  I guess I approach the Squeak community as a team rather than a
group of "rugged individuals".


>> This does require reading external text a character at a time, but doesn't
>> seem prohibitively expensive.
>
>First make it work.... then make it fast
[snip]
I heartily agree.


>As far as line end convention goes, I think the important thing to do is 
>to factor out the handling into a Policy object.  Otherwise the streaming 
>code just gets all krufted up with different cases.
>
>If somebody wants a different policy, they add a new class instead of 
>futzing with the convoluted code.

Would this be similar to the External Stream Decorator suggestion I copped
from the preview of the book "The Design Patterns Smalltalk Companion"?

Your idea sounds promising, could you explain it a little more concretely
for me.

--
Richard A. Harmon          "The only good zombie is a dead zombie"
harmonra at webname.com           E. G. McCarthy





More information about the Squeak-dev mailing list