[squeak-dev] problems with line separators in Linux

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Jun 11 18:49:03 UTC 2010


2010/6/11 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> Hi Ralph,
>
> Here is a little highlight on the CR/LF strategy.
>
> 1) CR+LF (windows)  or CR only (mac) or LF only (unix) exists in the
> external world whether we like it or not.
>
> 2) Given 1), Smalltalk main strategy - and in particular Squeak - has
> always been this one:
> 2.a) convert every input from external-world to CR,
> 2.b) convert every output to external world to platform-specific preference.
>
> Historically, this was implemented in CrLfFileStream.
> But it has been superseded by MultiByteFileStream.
> If you inspect it's API, you'll see it provides both automated
> platform guessed or programmable lineEndConvention.
>
> 3) If all tools used to develop applications
> (Smalltalk/ruby/python/perl/javascript/.Net/etc...)
>  did provide APIS making these applications insensitive to line end
> conventions,
>  then we would be in a better world and would have to care less about
> line end conventions.
>  This is easier than to impose a uniform line-end convention to the
> world, because it enables a smooth transition.
>
> 4) Until strategy 2) is perfect and absolutely no LF-in-image
> CR-out-image leakage occurs, then Squeak/Pharo are bad citizens,
> They are sensitive to line-end-conventions and break chains made of
> multiple heterogeneous tools.
>
> 5) I observed, you observed, everyone observed recurrent deficiencies
> in either 2.a) or 2.b) or both...
>
> 6) So my logical conclusion is to propose a complementary strategy:
> 6.a) Let Smalltalk algorithms work pan-line-ending-conventions.
>
> Observe how any decent file editor (notepad vim etc...) works
> transparently whatever line-end-conventions.
> IMO, it's a shame that the so-called reference Object-Oriented
> language cannot deal with mixed line-end-conventions.
>
> 7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order
> to reach goal 3). This is two-fold:
> 7-a) let display CR-LF or LF or CR as a single line break (changes in
> CharacterScanner and co)
> 7-b) let Stream and String handle CR-LF or LF or CR delimited lines.
>
> Note that I cared to provide decently optimized implementations (often
> more optimized than previous CR-only algorithms).
>
> 8) Of course, in order to profit by new 7-b) facilities, there's a
> little change of API.
> We need to replace some old-fashioned idioms (myStream upTo: Character
> cr) with modernized pan-line-ending-wise (myStream nextLine).
>
> 9) I did not apply these changes very deeply to Squeak nor Pharo, but
> at a few places here and there...
> So there is still a bit of work to reach goal 3)
> (parsing the menus specs is just an example of it)
>
> 10) This 6.a) strategy could eventually replace 2.a), but it does not
> have to, and we didn't went this way...
> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always
> has been with this respect.
>
> 11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
> applications are line-ending sensitive, then WE must care of producing
> the expected convention.
>
> Conclusions
>
> So my opinion is that 6.a) did not make our life worse.
> On the contrary, Squeak and Pharo are moving toward what I would call
> a better behaved I.T. world citizen.
> They now offers an API to handle line-endings transparently inside the image.
> This is at the price of not-so-much complexity, and no noticeable slow down.
> But now we have to learn new idioms (and I don't see nextLine as more
> complex than upTo: Character cr)...
> ... and apply it were due (like parsing menu specs) to obtain a
> homogeneous behaviour- goal 3)
>
> We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
> be achieved.

Opps, once 3) will be achieved

> And maybe in the future, we will be able to get rid of 2.b) too when
> all applications will be line-ending-insensitive.
> In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
> avoid LF leaking in or CR leaking out the image.
> But untill 2) strategy is perfect, then we just act as one of the bad
> world citizen perpetuating line-ending problems.
> IMO reaching goal 3) is easier than reaching goal 2).
>
> That's only my personal opinion, but it's based on pragmatic years of
> using bad line-ending behaved apps and trying to program a bit better
> ones.
>
> There are alternate possible strategies, like in CUIS: display a boxed
> [LF] explicitely in text editors so as to provide visual control to
> programmers...
>
> Not sure I sold my POV. It's quite opposite to your proposition.
> You don't have to adhere, but at least you have some rationale.
>
> Cheers
>
> Nicolas
>
> 2010/6/11 Ralph Boland <rpboland at gmail.com>:
>> Ever since I started using Squeak with Squeak 3.6
>>  (I only use Linux, currently Ubuntu 9.10)
>> I have always had trouble with line separators.
>> I am checking out  Squeak 4.1 and things have
>> changed though I am not sure if they are any better.
>>
>> If I have  FileStream>>contreteStream return MultiByteFileStream
>> (the default) then when I fileOut code the .st file consists of a single
>> line so that I cannot use utilities such as  wc and vi on these files.
>> If I modify concreteStream to return  CrLfFileStream then the problem
>> goes away but a host other of problems occur.
>>
>> 1) It used to be that if you looked at the versions of a method
>> each version would be written on a single line  (I believe linefeeds
>> were used instead of carriage returns).
>> This no longer happens.  Instead every line of a version is separated
>> by a blank line.  An improvement I suppose; it is more readable.
>> (I believe what is happening is that end line separator contains
>> a line feed and a carriage return and both are treated as line
>> separators.)
>>
>> 2) It used to be that if you wrote out a file (with concreteStream returning
>> CrLfFileStream) then when you filed in the file using:
>>  (FileStream oldFileNamed:  'filename.st')  fileIn
>> you got carriage returns for line separators.
>> Now you get linefeeds.
>> This causes problems with Menu labels as in:
>>
>> aMenu labels:
>> 'find class... (f)
>> recent classes... (r)
>> browse all
>> browse
>> ...
>>
>> because carriage returns are expected.
>> Consequentially your Menu has a single entry. :-(
>> I expect there are other problems as well.
>> Now, even if you set  concreteStream  back to the default the same
>> problem occurs.  This works in 3.10.2 so we have gone backwards here.
>>
>> 3) If I cut and paste from a different  4.1 image I lose
>> my line separators altogether.  this is unchanged from before.
>>
>> 4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
>> haven't had a problem but I have only loaded 2 .mcz files.
>> In prior versions of Squeak loading a .mcz file would sometimes succeed.
>>
>> Are other Linux users having similar experiences?
>>
>> Frankly, I think only one of  Cr  and  Lf  should be accepted in
>> Smalltalk code, the
>> other generating a syntax error except inside strings and inside
>> strings it should
>> have to be escaped somehow.
>>
>> If  Cr is the character chosen for line separators then it should be
>> impossible to
>> write:
>>
>> returnAString
>>      ^'a two line string where
>> the line separator is a linefeed'
>>
>> The fact that the above code is legal leads to subtle errors such
>> as those above.  A blatant compiler error is preferred.
>>
>>
>> One final curiousity:  Why is the following method written as it is
>> (in both 4.1 and 3.10.2)?
>>
>> Method  CrLfFileStream>>new
>>
>>        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.
>>
>>
>> I presume it is correct but a comment explaining why wouldn't hurt.
>>
>> Regards,
>>
>> Ralph Boland
>>
>>
>>
>> --
>> Quantum Theory cannot save us from the tyranny of a deterministic universe
>> but it does give God something to do
>>
>>
>



More information about the Squeak-dev mailing list