[squeak-dev] news from the Xtream front

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Dec 8 07:24:44 UTC 2009


2009/12/8 Levente Uzonyi <leves at elte.hu>:
> On Tue, 8 Dec 2009, Nicolas Cellier wrote:
>
>> To give a concrete view of what improment we might further get beyond
>> excellent changes from Levente, i just tried this in latest trunk,
>> with latest Xtream version:
>>
>> {
>> [| tmp | tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
>> at: 2) name) ascii; wantsLineEndConversion: false; converter:
>> UTF8TextConverter new.
>>      1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
>> [| tmp | tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles
>> at: 2) name) readXtream ascii buffered decodeWith: (UTF8TextConverter
>> new installLineEndConvention: nil)) buffered.
>>      1 to: 10000 do: [:i | tmp upTo: Character cr]. tmp close] timeToRun.
>> }
>>
>> #(1395 84)
>>
>
> Really cool. :)
>
>> The first is the recently optimized trunk version. Unfortunately,
>> MultiByteFileStream at work, you get a looong one by one decoding
>> The second is the Xtream version with crafted #buffered sends.
>> Hardly believable what you can do with a utf8ToSqueak-like hack and a
>> buffer...
>>
>> Of course, this version is optimized only in case of ASCII source
>> encoded in UTF8 (the easy case, but the most common case concerning
>> source files).
>
> Don't forget that the sources are sometimes read backwards by the current
> code.
>

Oh yes, like this ?

| file |
[file := MultiByteFileStream newFileNamed: 'mbfs_skip.tst'.
file ascii; wantsLineEndConversion: false; converter: UTF8TextConverter new.
file nextPutAll: 'Ceci doit changé'.
file skip: -1. "Oops - grammatically incorrect"
file nextPutAll: 'er'.
file close.

file := StandardFileStream oldFileNamed: 'mbfs_skip.tst'.
file ascii.
file contentsOfEntireFile.]
	ensure: [file close.
		FileDirectory default deleteFileNamed: 'mbfs_skip.tst'].
-> 'Ceci doit changÃer' "Oops squeakly incorrect"

Ah Ah, MultiByteFileStream let us see a stream of encoded characters,
but position over a stream of bytes...
The only programmer choice is to put marks (by inquiring aMBFS
position) and restore position using these marks...

>> I don't know what hapens when encountering a multi-byte utf-8 char...
>> ... all I know is that performance in this case is likely a disaster
>> (my code is a bit stupid, but it's too late do correct it now)
>>
>
> It can still be much better than the current approach.
>

Yes it could

>> Oh, maybe Levente will just port the idea tomorrow in trunk, so I can
>> have a bit more rest ;)
>>
>
> Well, maybe, I'm working on other hacks, but I'll take a look, I'm starting
> to like the idea. ;)
>

Making something simple out of current MultiByteFileStream mess is a
challenge I don't even want to take, but you seem a but tougher than
me.

Cheers

Nicolas

>
> Levente
>
>> Cheers
>>
>> Nicolas
>>
>>
>
>



More information about the Squeak-dev mailing list