[squeak-dev] problems with line separators in Linux

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Jun 11 21:32:16 UTC 2010


So, I just commited a few trunk changes toward the goal of making
Squeak immune to line endings.
There might be a bit more idioms to fix, but that's certainly not that
difficult.
See also a shorter manifest at
http://code.google.com/p/pharo/issues/detail?id=2538.
That does not prevents us to continue improving the conversion strategy.

I should better stop speaking alone now ;)

Nicolas

2010/6/11 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> 2010/6/11 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>> Hi Ralph,
>>
>> Here is a little highlight on the CR/LF strategy.
>>
>> 1) CR+LF (windows)  or CR only (mac) or LF only (unix) exists in the
>> external world whether we like it or not.
>>
>> 2) Given 1), Smalltalk main strategy - and in particular Squeak - has
>> always been this one:
>> 2.a) convert every input from external-world to CR,
>> 2.b) convert every output to external world to platform-specific preference.
>>
>> Historically, this was implemented in CrLfFileStream.
>> But it has been superseded by MultiByteFileStream.
>> If you inspect it's API, you'll see it provides both automated
>> platform guessed or programmable lineEndConvention.
>>
>> 3) If all tools used to develop applications
>> (Smalltalk/ruby/python/perl/javascript/.Net/etc...)
>>  did provide APIS making these applications insensitive to line end
>> conventions,
>>  then we would be in a better world and would have to care less about
>> line end conventions.
>>  This is easier than to impose a uniform line-end convention to the
>> world, because it enables a smooth transition.
>>
>> 4) Until strategy 2) is perfect and absolutely no LF-in-image
>> CR-out-image leakage occurs, then Squeak/Pharo are bad citizens,
>> They are sensitive to line-end-conventions and break chains made of
>> multiple heterogeneous tools.
>>
>> 5) I observed, you observed, everyone observed recurrent deficiencies
>> in either 2.a) or 2.b) or both...
>>
>> 6) So my logical conclusion is to propose a complementary strategy:
>> 6.a) Let Smalltalk algorithms work pan-line-ending-conventions.
>>
>> Observe how any decent file editor (notepad vim etc...) works
>> transparently whatever line-end-conventions.
>> IMO, it's a shame that the so-called reference Object-Oriented
>> language cannot deal with mixed line-end-conventions.
>>
>> 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order
>> to reach goal 3). This is two-fold:
>> 7-a) let display CR-LF or LF or CR as a single line break (changes in
>> CharacterScanner and co)
>> 7-b) let Stream and String handle CR-LF or LF or CR delimited lines.
>>
>> Note that I cared to provide decently optimized implementations (often
>> more optimized than previous CR-only algorithms).
>>
>> 8) Of course, in order to profit by new 7-b) facilities, there's a
>> little change of API.
>> We need to replace some old-fashioned idioms (myStream upTo: Character
>> cr) with modernized pan-line-ending-wise (myStream nextLine).
>>
>> 9) I did not apply these changes very deeply to Squeak nor Pharo, but
>> at a few places here and there...
>> So there is still a bit of work to reach goal 3)
>> (parsing the menus specs is just an example of it)
>>
>> 10) This 6.a) strategy could eventually replace 2.a), but it does not
>> have to, and we didn't went this way...
>> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always
>> has been with this respect.
>>
>> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
>> applications are line-ending sensitive, then WE must care of producing
>> the expected convention.
>>
>> Conclusions
>>
>> So my opinion is that 6.a) did not make our life worse.
>> On the contrary, Squeak and Pharo are moving toward what I would call
>> a better behaved I.T. world citizen.
>> They now offers an API to handle line-endings transparently inside the image.
>> This is at the price of not-so-much complexity, and no noticeable slow down.
>> But now we have to learn new idioms (and I don't see nextLine as more
>> complex than upTo: Character cr)...
>> ... and apply it were due (like parsing menu specs) to obtain a
>> homogeneous behaviour- goal 3)
>>
>> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
>> be achieved.
>
> Opps, once 3) will be achieved
>
>> And maybe in the future, we will be able to get rid of 2.b) too when
>> all applications will be line-ending-insensitive.
>> In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
>> avoid LF leaking in or CR leaking out the image.
>> But untill 2) strategy is perfect, then we just act as one of the bad
>> world citizen perpetuating line-ending problems.
>> IMO reaching goal 3) is easier than reaching goal 2).
>>
>> That's only my personal opinion, but it's based on pragmatic years of
>> using bad line-ending behaved apps and trying to program a bit better
>> ones.
>>
>> There are alternate possible strategies, like in CUIS: display a boxed
>> [LF] explicitely in text editors so as to provide visual control to
>> programmers...
>>
>> Not sure I sold my POV. It's quite opposite to your proposition.
>> You don't have to adhere, but at least you have some rationale.
>>
>> Cheers
>>
>> Nicolas
>>
>> 2010/6/11 Ralph Boland <rpboland at gmail.com>:
>>> Ever since I started using Squeak with Squeak 3.6
>>>  (I only use Linux, currently Ubuntu 9.10)
>>> I have always had trouble with line separators.
>>> I am checking out  Squeak 4.1 and things have
>>> changed though I am not sure if they are any better.
>>>
>>> If I have  FileStream>>contreteStream return MultiByteFileStream
>>> (the default) then when I fileOut code the .st file consists of a single
>>> line so that I cannot use utilities such as  wc and vi on these files.
>>> If I modify concreteStream to return  CrLfFileStream then the problem
>>> goes away but a host other of problems occur.
>>>
>>> 1) It used to be that if you looked at the versions of a method
>>> each version would be written on a single line  (I believe linefeeds
>>> were used instead of carriage returns).
>>> This no longer happens.  Instead every line of a version is separated
>>> by a blank line.  An improvement I suppose; it is more readable.
>>> (I believe what is happening is that end line separator contains
>>> a line feed and a carriage return and both are treated as line
>>> separators.)
>>>
>>> 2) It used to be that if you wrote out a file (with concreteStream returning
>>> CrLfFileStream) then when you filed in the file using:
>>>  (FileStream oldFileNamed:  'filename.st')  fileIn
>>> you got carriage returns for line separators.
>>> Now you get linefeeds.
>>> This causes problems with Menu labels as in:
>>>
>>> aMenu labels:
>>> 'find class... (f)
>>> recent classes... (r)
>>> browse all
>>> browse
>>> ...
>>>
>>> because carriage returns are expected.
>>> Consequentially your Menu has a single entry. :-(
>>> I expect there are other problems as well.
>>> Now, even if you set  concreteStream  back to the default the same
>>> problem occurs.  This works in 3.10.2 so we have gone backwards here.
>>>
>>> 3) If I cut and paste from a different  4.1 image I lose
>>> my line separators altogether.  this is unchanged from before.
>>>
>>> 4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
>>> haven't had a problem but I have only loaded 2 .mcz files.
>>> In prior versions of Squeak loading a .mcz file would sometimes succeed.
>>>
>>> Are other Linux users having similar experiences?
>>>
>>> Frankly, I think only one of  Cr  and  Lf  should be accepted in
>>> Smalltalk code, the
>>> other generating a syntax error except inside strings and inside
>>> strings it should
>>> have to be escaped somehow.
>>>
>>> If  Cr is the character chosen for line separators then it should be
>>> impossible to
>>> write:
>>>
>>> returnAString
>>>      ^'a two line string where
>>> the line separator is a linefeed'
>>>
>>> The fact that the above code is legal leads to subtle errors such
>>> as those above.  A blatant compiler error is preferred.
>>>
>>>
>>> One final curiousity:  Why is the following method written as it is
>>> (in both 4.1 and 3.10.2)?
>>>
>>> Method  CrLfFileStream>>new
>>>
>>>        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.
>>>
>>>
>>> I presume it is correct but a comment explaining why wouldn't hurt.
>>>
>>> Regards,
>>>
>>> Ralph Boland
>>>
>>>
>>>
>>> --
>>> Quantum Theory cannot save us from the tyranny of a deterministic universe
>>> but it does give God something to do
>>>
>>>
>>
>



More information about the Squeak-dev mailing list