[squeak-dev] The Trunk: Collections-nice.933.mcz

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Apr 10 19:22:04 UTC 2021


Hi all,
currently, MultiByteFileStream nextChunk is incorrect for utf8 stream:
it attempts to decode utf8 twice.

This crafted example will raise an error:

    (MultiByteFileStream newFileNamed: 'foo.utf8') nextPutAll: 'À'; close.
    (MultiByteFileStream oldFileNamed: 'foo.utf8') nextChunk.

nextChunk sends basicUpTo: with the intention to get an un-converted
string, then sends decodeString: to have fast (bulk) decoding.

Unfortunately basicUpTo: sends next instead of basicNext... This makes
the utf8 decoded twice, which can falsify the source code in some
cases, or even fail with an Error like in the crafted example above.

an accelerated version of basicUpTo: was provided by Levente in
Multilingual-ul.85.mcz
but was removed in Multilingual-ar.119.mcz, and I didn't understand
the intention by reading the commit message...
basicUpTo: was then broken in Collections-ul.438.mcz, and fixed in
Collections-eem.684.mcz with the double decoding problem.


Le sam. 10 avr. 2021 à 21:17, <commits at source.squeak.org> a écrit :
>
> Nicolas Cellier uploaded a new version of Collections to project The Trunk:
> http://source.squeak.org/trunk/Collections-nice.933.mcz
>
> ==================== Summary ====================
>
> Name: Collections-nice.933
> Author: nice
> Time: 10 April 2021, 9:16:54.312889 pm
> UUID: c066bf52-b7a5-474a-9614-90bbc3212e07
> Ancestors: Collections-ul.932
>
> Quick fix for double utf8->squeak conversion via nextChunk.
>
>     (MultiByteFileStream newFileNamed: 'foo.utf8') nextPutAll: 'À'; close.
>     (MultiByteFileStream oldFileNamed: 'foo.utf8') nextChunk.
>
> =============== Diff against Collections-ul.932 ===============
>
> Item was changed:
>   ----- Method: PositionableStream>>basicUpTo: (in category 'private basic') -----
>   basicUpTo: anObject
>         "Answer a subcollection from the current access position to the
>         occurrence (if any, but not inclusive) of anObject in the receiver. If
>         anObject is not in the collection, answer the entire rest of the receiver."
>         | newStream element |
>         newStream := WriteStream on: (self collectionSpecies new: 100).
> +       [self atEnd or: [(element := self basicNext) = anObject]]
> -       [self atEnd or: [(element := self next) = anObject]]
>                 whileFalse: [newStream nextPut: element].
>         ^newStream contents!
>
>


More information about the Squeak-dev mailing list