[squeak-dev] Re: problems with line separators in Linux
nicolas.cellier.aka.nice at gmail.com
Sat Jun 12 16:23:54 UTC 2010
2010/6/12 Ralph Boland <rpboland at gmail.com>:
>> >> 10) This 6.a) strategy could eventually replace 2.a), but it does not
>> >> have to, and we didn't went this way...
>> >> So both Squeak 4.1 and Pharo 1.1 are not any worse than Squeak > > always
>> >> has been with this respect.
>> > Except that now the conversion of Lf in Linux files to Cr in Squeak no longer
>> > occurs and this breaks things such as Menu labels. Thus things that used
>> > to work now don't.
>> I don't see what change could cause this problem...
> I checked out loading a .st file in both Squeak 10.2 and Squeak 4.1.
> filing in the
> following file:
> 'From Squeak4.1 of 17 April 2010 [latest update: #9957] on 7 June 2010
> at 9:30:22 pm'!
> Object subclass: #Junk
> instanceVariableNames: ''
> classVariableNames: ''
> poolDictionaries: ''
> category: 'Kernel-Objects'!
> !Junk methodsFor: 'as yet unclassified' stamp: 'rpb 6/7/2010 21:27'!
> | a |
> a := 'abc
> self halt.
> a := a.! !
> In Squeak 10.2
> a) If concreteStream returns CrLfFileStream:
> ClassCategoryReader eventually calls scanFrom: aStream where aStream
> is a MultiByteFileStream and the next chunk of text is:
> | a |
> a := 'abc
> self halt.
> a := a.! !
> At this point 'aStream nextChunkText' is called.
> which does: string := self nextChunk.
> nextChunk then does a 'self skipSeparators' and then calls 'self
> next' in a loop.
> The 'self next' reads the next character and does a 'self doConversion' test
> which returns true so if the character read is a Lf character it is
> converted into a Cr character.
> b) If concreteStream returns MultiByteFileStream:
> The same thing happens except doConversion returns false and so
> Lf characters are NOT converted into Cr characters.
> In Squeak 4.1 everything is the same up to the point 'aStream
> nextChunkText' is called.
> nextChunkText calls:
> '^converter nextChunkTextFromStream: self'
> where converter is a MacRomanTextConverter.
> Following the trail from here looks completely different than the
> Squeak 10.2 code.
> In particular I could not find where an attempt to convert Lf
> characters to Cr characters
> was supposed to occur let alone why it failed if concreteStream
> returns CrLfFileStream.
> Note that in 10.2 if concreteStream returns MultiByteFileStream:
> then Lf characters
> are NOT converted into Cr characters. I would have expected Lf
> characters and
> Cr,Lf character pairs to be converted to Cr characters regardless of
> what concreteStream
> returns. We do at this point know we are reading Squeak code so Lfs
> are inappropriate.
> >From my point of view there is no need for the 'doConversion' test at
> all except in strings
> where the user may intensionally want Lf or Cr,Lf for some odd
> reason and we shouldn't
> break his/her code. In that case no conversion should be done under
> any circumstances
> so the code is wrong both ways: it fails to convert when it should
> and converts when
> it shouldn't.
> Since I couldn't figure out how 4.1 handles things I can't say if it
> does any better.
> Hope this explains a few things.
>> The recent commit should solve the menu problem in presence of LF leakage.
> How do I install the version with this fix rather than the version of 4.1 found
> on the Squeak page?
> Ralph Boland
ABOUT THE IMPLEMENTATION:
The two inst.vars of interest in MultiByteFileStream are
There are two ways to set the line ending convention.
1) myStream wantsLineEndConversion: true.
if you follow the code, you will discover that:
lineEndConvention ifNil: [ self detectLineEndConvention ].
detectLineEndConvention scan the an input file to guess the line
ending convention at first line break
(this won't work if you have mixed conventions in your file...)
detectLineEndConvention set the lineEndConvention to LineEndDefault
for output files (if empty)
This default is guessed at image startup based on underlying OS (see
2) myStream lineEndConvention: aSymbol (#cr #lf #crlf or nil).
handling of line endings does not happens entirely here though...
this leads you to:
This is an optimization.
Since TextConverter already converts characters with proper encoding,
it also handles line ending with no extra cost.
For output, you can see part of the job is performed in
These handling assume the Smalltalk String is made of CR only... (so
not immune to LF !!!).
Anyway, the MultiByteFileStream is an ugly part of the system, so I
don't encourage you to loose time in its tricky implementation.
ABOUT THE DEFAULT BEHAVIOUR:
The first method is invoked when you use CrLfFileStream to create your stream.
This second method is not sent.
So by default, NO conversion is performed, unless you EXPLICITELY use
- or #wantsLineEndConversion: message after stream creation
- or #lineEndConvention: message
Not sure it was much different in 3.10.2, no time to inquire...
Hope this helps
More information about the Squeak-dev