[squeak-dev] Re: problems with line separators in Linux

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Jun 12 16:23:54 UTC 2010


2010/6/12 Ralph Boland <rpboland at gmail.com>:
> ...
>> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not
>> >> have to, and we didn't went this way...
>> >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > > always
>> >> has been with this respect.
>> >
>> > Except that now the conversion of Lf in Linux files to  Cr in Squeak no longer
>> > occurs and this breaks things such as Menu labels.  Thus things that used
>> > to work now don't.
>> >
>
>> I don't see what change could cause this problem...
>
>
> I checked out loading a .st file in both  Squeak 10.2 and Squeak 4.1.
> filing in the
> following file:
>
> 'From Squeak4.1 of 17 April 2010 [latest update: #9957] on 7 June 2010
> at 9:30:22 pm'!
> Object subclass: #Junk
>        instanceVariableNames: ''
>        classVariableNames: ''
>        poolDictionaries: ''
>        category: 'Kernel-Objects'!
>
> !Junk methodsFor: 'as yet unclassified' stamp: 'rpb 6/7/2010 21:27'!
> junk
>
>        | a |
>        a := 'abc
>        def
>        ghi'.
>        self halt.
>        a := a.! !
>
> In  Squeak 10.2
>  a)  If  concreteStream returns  CrLfFileStream:
>       ClassCategoryReader  eventually calls  scanFrom: aStream where  aStream
>       is a  MultiByteFileStream and the next chunk of text is:
>
> junk
>
>        | a |
>        a := 'abc
>        def
>        ghi'.
>        self halt.
>        a := a.! !
>
> At this point 'aStream nextChunkText' is called.
> which does:  string := self nextChunk.
> nextChunk then does a 'self skipSeparators' and then calls  'self
> next' in a loop.
>
> The 'self next'  reads the next character and does a 'self doConversion' test
> which returns true so if the character read is a  Lf character it is
>  converted into a Cr character.
>
> b)  If  concreteStream returns  MultiByteFileStream:
>     The same thing happens except  doConversion returns false and so
>      Lf characters are NOT converted into Cr characters.
>
> In  Squeak 4.1 everything is the same up to the point  'aStream
> nextChunkText' is called.
> nextChunkText calls:
>
>        '^converter nextChunkTextFromStream: self'
>
> where converter is a MacRomanTextConverter.
>
> Following the trail from here looks completely different than the
> Squeak 10.2 code.
> In particular I could not find where an attempt to convert  Lf
> characters to Cr characters
> was supposed to occur let alone why it failed if  concreteStream
> returns CrLfFileStream.
>
>
> Note that in  10.2  if   concreteStream returns  MultiByteFileStream:
> then  Lf  characters
> are NOT converted into  Cr characters.   I would have expected  Lf
> characters and
> Cr,Lf  character pairs to be converted to  Cr characters regardless of
> what  concreteStream
> returns.  We do at this point know we are reading  Squeak code so Lfs
> are inappropriate.
> >From my point of view there is no need for the 'doConversion' test at
> all except in strings
> where the user may intensionally want  Lf or  Cr,Lf for some odd
> reason and we shouldn't
> break his/her code.  In that case no conversion should be done under
> any circumstances
> so the code is wrong both ways:  it fails to convert when it should
> and converts when
> it shouldn't.
>
> Since I couldn't figure out how 4.1 handles things I can't say if it
> does any better.
>
> Hope this explains a few things.
>
>> The recent commit should solve the menu problem in presence of LF leakage.
>
> How do I install the version with this fix rather than the version of  4.1 found
> on the Squeak page?
>
> Regards,
>
> Ralph Boland
>
>

ABOUT THE IMPLEMENTATION:
------------------------------------------------

The two inst.vars of interest in MultiByteFileStream are
- wantsLineEndConversion
- lineEndConvention

There are two ways to set the line ending convention.
1) myStream wantsLineEndConversion: true.
  if you follow the code, you will discover that:
	lineEndConvention ifNil: [ self detectLineEndConvention ].

  detectLineEndConvention scan the an input file to guess the line
ending convention at first line break
  (this won't work if you have mixed conventions in your file...)

  detectLineEndConvention set the lineEndConvention to LineEndDefault
for output files (if empty)
  This default is guessed at image startup based on underlying OS (see
guessDefaultLineEndConvention).

2) myStream lineEndConvention: aSymbol (#cr #lf #crlf or nil).
  handling of line endings does not happens entirely here though...
	self installLineEndConventionInConverter.
  this leads you to:
	TextConverter>>installLineEndConvention:

This is an optimization.
Since TextConverter already converts characters with proper encoding,
it also handles line ending with no extra cost.

For output, you can see part of the job is performed in
MultiByteFileStream>>nextPut: too...
These handling assume the Smalltalk String is made of CR only... (so
not immune to LF !!!).

Anyway, the MultiByteFileStream is an ugly part of the system, so I
don't encourage you to loose time in its tricky implementation.


ABOUT THE DEFAULT BEHAVIOUR:
-----------------------------------------------------

The first method is invoked when you use CrLfFileStream to create your stream.
This second method is not sent.

So by default, NO conversion is performed, unless you EXPLICITELY use
  - CrLfFileStream
  - or #wantsLineEndConversion: message after stream creation
  - or #lineEndConvention: message

Not sure it was much different in 3.10.2, no time to inquire...

Hope this helps

Nicolas



More information about the Squeak-dev mailing list