[squeak-dev] problems with line separators in Linux

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Jun 11 18:42:29 UTC 2010


Hi Ralph,

Here is a little highlight on the CR/LF strategy.

1) CR+LF (windows)  or CR only (mac) or LF only (unix) exists in the
external world whether we like it or not.

2) Given 1), Smalltalk main strategy - and in particular Squeak - has
always been this one:
2.a) convert every input from external-world to CR,
2.b) convert every output to external world to platform-specific preference.

Historically, this was implemented in CrLfFileStream.
But it has been superseded by MultiByteFileStream.
If you inspect it's API, you'll see it provides both automated
platform guessed or programmable lineEndConvention.

3) If all tools used to develop applications
(Smalltalk/ruby/python/perl/javascript/.Net/etc...)
  did provide APIS making these applications insensitive to line end
conventions,
  then we would be in a better world and would have to care less about
line end conventions.
  This is easier than to impose a uniform line-end convention to the
world, because it enables a smooth transition.

4) Until strategy 2) is perfect and absolutely no LF-in-image
CR-out-image leakage occurs, then Squeak/Pharo are bad citizens,
They are sensitive to line-end-conventions and break chains made of
multiple heterogeneous tools.

5) I observed, you observed, everyone observed recurrent deficiencies
in either 2.a) or 2.b) or both...

6) So my logical conclusion is to propose a complementary strategy:
6.a) Let Smalltalk algorithms work pan-line-ending-conventions.

Observe how any decent file editor (notepad vim etc...) works
transparently whatever line-end-conventions.
IMO, it's a shame that the so-called reference Object-Oriented
language cannot deal with mixed line-end-conventions.

7) So I started to implement 6.a) in Squeak 4.1 and Pharo 1.1 in order
to reach goal 3). This is two-fold:
7-a) let display CR-LF or LF or CR as a single line break (changes in
CharacterScanner and co)
7-b) let Stream and String handle CR-LF or LF or CR delimited lines.

Note that I cared to provide decently optimized implementations (often
more optimized than previous CR-only algorithms).

8) Of course, in order to profit by new 7-b) facilities, there's a
little change of API.
We need to replace some old-fashioned idioms (myStream upTo: Character
cr) with modernized pan-line-ending-wise (myStream nextLine).

9) I did not apply these changes very deeply to Squeak nor Pharo, but
at a few places here and there...
So there is still a bit of work to reach goal 3)
(parsing the menus specs is just an example of it)

10) This 6.a) strategy could eventually replace 2.a), but it does not
have to, and we didn't went this way...
So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak always
has been with this respect.

11) Strategy 6.a) DOES NOT replace 2.b). If our down-chain
applications are line-ending sensitive, then WE must care of producing
the expected convention.

Conclusions

So my opinion is that 6.a) did not make our life worse.
On the contrary, Squeak and Pharo are moving toward what I would call
a better behaved I.T. world citizen.
They now offers an API to handle line-endings transparently inside the image.
This is at the price of not-so-much complexity, and no noticeable slow down.
But now we have to learn new idioms (and I don't see nextLine as more
complex than upTo: Character cr)...
... and apply it were due (like parsing menu specs) to obtain a
homogeneous behaviour- goal 3)

We still have to care of 2.b), and a bit less of 2.a) once 4.b) will
be achieved.
And maybe in the future, we will be able to get rid of 2.b) too when
all applications will be line-ending-insensitive.
In  the meantime, nothing prevents us to improve 2.a) and 2.b) to
avoid LF leaking in or CR leaking out the image.
But untill 2) strategy is perfect, then we just act as one of the bad
world citizen perpetuating line-ending problems.
IMO reaching goal 3) is easier than reaching goal 2).

That's only my personal opinion, but it's based on pragmatic years of
using bad line-ending behaved apps and trying to program a bit better
ones.

There are alternate possible strategies, like in CUIS: display a boxed
[LF] explicitely in text editors so as to provide visual control to
programmers...

Not sure I sold my POV. It's quite opposite to your proposition.
You don't have to adhere, but at least you have some rationale.

Cheers

Nicolas

2010/6/11 Ralph Boland <rpboland at gmail.com>:
> Ever since I started using Squeak with Squeak 3.6
>  (I only use Linux, currently Ubuntu 9.10)
> I have always had trouble with line separators.
> I am checking out  Squeak 4.1 and things have
> changed though I am not sure if they are any better.
>
> If I have  FileStream>>contreteStream return MultiByteFileStream
> (the default) then when I fileOut code the .st file consists of a single
> line so that I cannot use utilities such as  wc and vi on these files.
> If I modify concreteStream to return  CrLfFileStream then the problem
> goes away but a host other of problems occur.
>
> 1) It used to be that if you looked at the versions of a method
> each version would be written on a single line  (I believe linefeeds
> were used instead of carriage returns).
> This no longer happens.  Instead every line of a version is separated
> by a blank line.  An improvement I suppose; it is more readable.
> (I believe what is happening is that end line separator contains
> a line feed and a carriage return and both are treated as line
> separators.)
>
> 2) It used to be that if you wrote out a file (with concreteStream returning
> CrLfFileStream) then when you filed in the file using:
>  (FileStream oldFileNamed:  'filename.st')  fileIn
> you got carriage returns for line separators.
> Now you get linefeeds.
> This causes problems with Menu labels as in:
>
> aMenu labels:
> 'find class... (f)
> recent classes... (r)
> browse all
> browse
> ...
>
> because carriage returns are expected.
> Consequentially your Menu has a single entry. :-(
> I expect there are other problems as well.
> Now, even if you set  concreteStream  back to the default the same
> problem occurs.  This works in 3.10.2 so we have gone backwards here.
>
> 3) If I cut and paste from a different  4.1 image I lose
> my line separators altogether.  this is unchanged from before.
>
> 4?) Finally,  I used to have problems loading in .mcz files.  So far in 4.1 I
> haven't had a problem but I have only loaded 2 .mcz files.
> In prior versions of Squeak loading a .mcz file would sometimes succeed.
>
> Are other Linux users having similar experiences?
>
> Frankly, I think only one of  Cr  and  Lf  should be accepted in
> Smalltalk code, the
> other generating a syntax error except inside strings and inside
> strings it should
> have to be escaped somehow.
>
> If  Cr is the character chosen for line separators then it should be
> impossible to
> write:
>
> returnAString
>      ^'a two line string where
> the line separator is a linefeed'
>
> The fact that the above code is legal leads to subtle errors such
> as those above.  A blatant compiler error is preferred.
>
>
> One final curiousity:  Why is the following method written as it is
> (in both 4.1 and 3.10.2)?
>
> Method  CrLfFileStream>>new
>
>        ^ (MultiByteFileStream new) wantsLineEndConversion: true; yourself.
>
>
> I presume it is correct but a comment explaining why wouldn't hurt.
>
> Regards,
>
> Ralph Boland
>
>
>
> --
> Quantum Theory cannot save us from the tyranny of a deterministic universe
> but it does give God something to do
>
>



More information about the Squeak-dev mailing list