[squeak-dev] The Trunk: Collections-nice.933.mcz

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Apr 10 20:51:30 UTC 2021


Hi Levente,
feel free to publish those changes, this was just a quick fix.
I only detected this by inspecting inbox submissions
(http://source.squeak.org/treated/Collections-cbc.581.diff) and
started to wonder why we would need basicUpTo: at all.
In the mid-term I would prefer that we start using stream composition
rather than basic* protocol, but that's at the risk of bugs and
regressions

Le sam. 10 avr. 2021 à 22:13, Levente Uzonyi <leves at caesar.elte.hu> a écrit :
>
> Hi Nicolas,
>
> Nice analysis. I'm surprised I didn't notice either me making the
> copy-paste mistake in PositionableStream >> #basicUpTo: and Andreas
> removing the optimized version from MultiByteFileStream.
> As with all basic* methods, PositionableStream >> #basicUpTo:
> should simply be
>
>         ^self upTo: anObject
>
> I restored the removed version of MultiByteFileStream >> #basicUpTo: in my
> image. It makes reading the sources file 3x faster using the following
> "benchmark":
>
> | file |
> [
>         | hash |
>         file := (SourceFiles at: 1) readOnlyCopy.
>         hash := 0.
>         {
>                 [ [ file atEnd ] whileFalse: [
>                         hash := (hash bitXor: file nextChunk hash) hashMultiply ] ] timeToRun.
>                 hash } ]
>         ensure: [ file ifNotNil: [ file close ] ].
>
> "#(1414 265702489) naive"
> "#(469 265702489) optimized"
>
> So, I suggest restoring MultiByteFileStream >> #basicUpTo:, and changing
> PositionableStream >> #basicUpTo: to the version suggested above.
> Also, PositionableStream >> #upTo: could use #streamContents: instead of
> duplicating its behavior.
>
>
> Levente
>
> On Sat, 10 Apr 2021, Nicolas Cellier wrote:
>
> > Hi all,
> > currently, MultiByteFileStream nextChunk is incorrect for utf8 stream:
> > it attempts to decode utf8 twice.
> >
> > This crafted example will raise an error:
> >
> >    (MultiByteFileStream newFileNamed: 'foo.utf8') nextPutAll: 'À'; close.
> >    (MultiByteFileStream oldFileNamed: 'foo.utf8') nextChunk.
> >
> > nextChunk sends basicUpTo: with the intention to get an un-converted
> > string, then sends decodeString: to have fast (bulk) decoding.
> >
> > Unfortunately basicUpTo: sends next instead of basicNext... This makes
> > the utf8 decoded twice, which can falsify the source code in some
> > cases, or even fail with an Error like in the crafted example above.
> >
> > an accelerated version of basicUpTo: was provided by Levente in
> > Multilingual-ul.85.mcz
> > but was removed in Multilingual-ar.119.mcz, and I didn't understand
> > the intention by reading the commit message...
> > basicUpTo: was then broken in Collections-ul.438.mcz, and fixed in
> > Collections-eem.684.mcz with the double decoding problem.
> >
> >
> > Le sam. 10 avr. 2021 à 21:17, <commits at source.squeak.org> a écrit :
> >>
> >> Nicolas Cellier uploaded a new version of Collections to project The Trunk:
> >> http://source.squeak.org/trunk/Collections-nice.933.mcz
> >>
> >> ==================== Summary ====================
> >>
> >> Name: Collections-nice.933
> >> Author: nice
> >> Time: 10 April 2021, 9:16:54.312889 pm
> >> UUID: c066bf52-b7a5-474a-9614-90bbc3212e07
> >> Ancestors: Collections-ul.932
> >>
> >> Quick fix for double utf8->squeak conversion via nextChunk.
> >>
> >>     (MultiByteFileStream newFileNamed: 'foo.utf8') nextPutAll: 'À'; close.
> >>     (MultiByteFileStream oldFileNamed: 'foo.utf8') nextChunk.
> >>
> >> =============== Diff against Collections-ul.932 ===============
> >>
> >> Item was changed:
> >>   ----- Method: PositionableStream>>basicUpTo: (in category 'private basic') -----
> >>   basicUpTo: anObject
> >>         "Answer a subcollection from the current access position to the
> >>         occurrence (if any, but not inclusive) of anObject in the receiver. If
> >>         anObject is not in the collection, answer the entire rest of the receiver."
> >>         | newStream element |
> >>         newStream := WriteStream on: (self collectionSpecies new: 100).
> >> +       [self atEnd or: [(element := self basicNext) = anObject]]
> >> -       [self atEnd or: [(element := self next) = anObject]]
> >>                 whileFalse: [newStream nextPut: element].
> >>         ^newStream contents!
> >>
> >>


More information about the Squeak-dev mailing list