[squeak-dev] Squeak XTream + COG

Sun Sep 12 19:45:36 UTC 2010

2010/9/12 Levente Uzonyi <leves at elte.hu>:
> On Sat, 11 Sep 2010, Nicolas Cellier wrote:
>
>> My own Squeak XTream development has got a bit frozen...
>> The arrival of COG is an excellent pretext to wake it up.
>>
>> In short, the essence of this mail is that XTream performs well in COG
>> (as well as Stream or better).
>> Moreover it shows how some possible speed up are still possible for
>> files despite Levente Stream optimizations.
>> If interested, read the rest, if not apologies for the too long mail.
>>
>> Nicolas
>>
>>
>>
>> What Squeak XTream is not
>> ======================
>>
>> First, I'd like to renew my apologies to VW fellows (Michael Lucas
>> Smith & al) for hi-jacking the name.
>> Squeak XTream is not a port of VW XTream, though it share some ideas
>> and inspiration.
>> If someone wants to port VW XTream to Squeak, then I shall give the name
>> back.
>> In the interim, I keep it, it's too nice (any idea welcome for a rename).
>>
>> Squeak XTream is much less extreme than VW: for example, it preserve
>> most message names.
>> As so, it's just a possible replacement/clean-up of Squeak Stream.
>> Squeak XTream is also far less extended (no parser etc...).
>>
>> What Squeak XTream is
>> ===================
>>
>> Did you ever browse Squeak Stream hierarchy? Then you know what Squeak
>> XTream is about!
>> It's about reducing complexity and increasing quality by using simple
>> uniform concepts.
>> Not sure the goal is reached yet, but we must keep it in mind.
>>
>> The 1st idea behind Squeak XTream is to use Wrappers rather than
>> subclasses.
>> Wrappers act somehow as filters with an input and an output (think of
>> a Unix pipe).
>>
>> Note there are other alternatives like composition by Traits - see Nile.
>> Nile implements some Wrappers too though.
>>
>> The 2nd idea is to separate ReadXtream and WriteXtream as much as
>> possible.
>> Well Squeak XTream is not as extreme as VW with this respect too...
>> It still have a read/write stream subclass (not nice).
>> I think this was the main start point of Nile.
>>
>> The 3rd idea is to uniformely provide #readInto:startingAt:count: and
>> #next:putAll:startingAt: API.
>> This is essential to increase the throughput when possible (this is
>> the well known buffering).
>>
>> The 4th idea is to offer a parametric endOfStream handling for read
>> stream.
>> It can be a simple ^nil, raising an Exception or evaluating a Block...
>> (anything responding to value).
>> In Squeak, streaming on a collection containing nil could be problematic.
>>
>> The performances
>> ==============
>>
>> We have justified Squeak XTream by quality, but performances count too.
>> Essentially because Stream are used everywhere deep in the Kernel
>> operations (File read/write, character encoding/decoding,
>> Compiler/Parser, text processing etc...).
>>
>> With traditional VM, Squeak XTream performs well, but for #next sends
>> because it does not implement a primitive.
>> Good news, as already indicated by Eliot, the #next #nextPut:
>> primitives are absolutely not necessary with COG, better throw them
>> out!
>
> That's right, but we should keep them, because the SqueakVM is still the
> most widely used VM and that's significantly slower without the primitives.
>
>>
>>   {
>>   [| tmp |
>>       tmp := (String new: 10000) writeStream.
>>       1 to: 10000 do: [:i | tmp nextPut: $0]] bench.
>>   [| tmp |
>>       tmp := (String new: 10000) writeXtream.
>>       1 to: 10000 do: [:i | tmp nextPut: $0]] bench.
>>   }
>>   #('1,200 per second.' '1,310 per second.')
>>   #('1,180 per second.' '1,320 per second.')
>
> XTreams are a bit slower if you add a WideCharacter to the stream, because
> ByteString is swapped to WideString with #becomeForward:, while WriteStream

Ah, yes, sure, becomeForward: is expensive, but it happens once per
Stream at most.
Without a primitive for nextPut:, checking at each put would be too expensive.

>>> #nextPut: has a string specific hack for this.
>
>>
>>   {
>>   [| tmp |
>>       tmp := (String new: 10000 withAll: $0) readStream.
>>       1 to: 10000 do: [:i | tmp next]] bench.
>>   [| tmp |
>>       tmp := (String new: 10000 withAll: $0) readXtream.
>>       1 to: 10000 do: [:i | tmp next]] bench.
>>   }
>>  #('2,470 per second.' '2,470 per second.')
>>  #('2,490 per second.' '2,480 per second.')
>>
>>
>> This now makes XTream performance similar to Stream for every current
>> message.
>> This includes performances on all kind of simple loops.
>>   [ tmp := aStream next. tmp==nil] whileFalse: [ tmp doSomething ].
>>   [ aStream atEnd] whileFalse: [ aStream next doSomething ].
>>   aStream do: [:next | next doSomething ].
>> Though XTream adds one more possibility.
>>   aStream endOfStreamAction: [ ^nil].
>>   [ aStream next doSomething. true ] whileTrue. "Don't use repeat in
>> Squeak, it's not inlined"
>>
>>   | str |
>>   str := String new: 1000 withAll: $a.
>>   {
>>       [| tmp | tmp := str readStream. [tmp next==nil] whileFalse] bench.
>>       [| tmp | tmp := str readXtream. [tmp next==nil] whileFalse] bench.
>>   }
>>   #('34,800 per second.' '37,000 per second.')
>>   #('36,000 per second.' '37,000 per second.')
>>
>>   | str |
>>   str := String new: 1000 withAll: $a.
>>   {
>>       [tmp := str readStream. [tmp atEnd] whileFalse: [tmp next]] bench.
>>       [tmp := str readXtream. [tmp atEnd] whileFalse: [tmp next]] bench.
>>   }
>>   #('27,000 per second.' '27,500 per second.')
>>   #('26,600 per second.' '27,100 per second.')
>>
>> This also is the case of major messages upTo: upToAnyOf: nextPutAll:
>> etc...
>>
>>   | str |
>>   str := String new: 1000 withAll: $a.
>>   {
>>       [str readStream upTo: $b] bench.
>>       [str readXtream upTo: $b] bench.
>>   }
>>   #('294,000 per second.' '297,000 per second.')
>>   #('296,000 per second.' '293,000 per second.')
>>
>> Now, what about Files ?
>> Squeak FileStream has recently (4.1) known major speed up thanks to
>> the hard work of Levente which backported several experiments to
>> Squeak Stream hierarchy with 100% backward compatibility and smooth
>> transition. Bravo!
>> This does naturally reduce one of the advantage of Squeak XTream,
>> which was buffering optimizations.
>>
>>   {
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at: 2)
>> name).
>>       [tmp next==nil] whileFalse. tmp close] timeToRun.
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
>> 2) name) readXtream buffered.
>>       [tmp next==nil] whileFalse. tmp close] timeToRun.
>>   }
>>   #(1497 1164)
>>   #(1426 1132)
>>
>> The speed up is not null but not major neither.
>
> If you use #basicNext instead of #next for StandardFileStream and remove the
> primitive from it, the performance will be the same. So getting rid of the
> #basic* methods can give us a bit more speed besides cleaner code.
> I have an idea to implement this, but I'm not sure the transition to the new
> code would be smooth enough. I'm also not sure it's worth putting more
> effort in this, using another stream implementation may be a better idea.
>

Switching is difficult too, unless you provide 100% of old API (in
which case, you did not clean that much...).
Maybe evolution of old code is worth a try.

>> Though, some old Stream messages still deserve optimization as
>> demonstrated here:
>>
>>   {
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
>> 2) name) ascii.
>>       [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close]
>> timeToRun.
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed:
>>       (SourceFiles at: 2) name) readXtream ascii buffered.
>>       [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close]
>> timeToRun.
>>   }
>>   #(8854 716)
>>   #(8859 716)
>>
>>   {
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
>> 2) name) ascii.
>>       [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse.
>>       tmp close] timeToRun.
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
>> 2) name) readXtream ascii buffered.
>>       [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse.
>>       tmp close] timeToRun.
>>   }
>>   #(9138 857)
>>   #(9157 837)
>
> So far I couldn't find the real cause of this difference, maybe the vm
> profiler will tell more. MessageTally doesn't seem to be useful.
>
>>
>> One more important subject is the MultiByteFileStream bottleneck.
>> Internationalisation is an essential feature, many thanks to Yoshiki
>> for bringing it alive.
>> But the performance price is high.
>> Also, this is a place where Squeak dramatically require clean-ups (you
>> know all the basicNext and the like are just hackish).
>> Now, once again, since Levente buffering, the difference is not that high:
>>
>>   {
>>   [| tmp |
>>       tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
>> at: 2) name) ascii;
>>               wantsLineEndConversion: false; converter: UTF8TextConverter
>> new.
>>       1 to: 20000 do: [:i | tmp upTo: Character cr].
>>       tmp close] timeToRun.
>>   [| tmp |
>>       tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at: 2)
>> name)
>>               readXtream binary buffered ~> UTF8Decoder.
>>       1 to: 20000 do: [:i | tmp upTo: Character cr].
>>       tmp close] timeToRun.
>>   }
>>   #(152 120 )
>>   #(150 119 )
>>
>> But wait, the file was buffered (bytes are fetched from file by packets),
>> but the decoder was not! All decoding is performed char by char.
>> That's bad, because when only a few bytes require decoding and
>> majority can be translated unchanged to String, there is potentially a
>> major speed up by simple using a sub-array copy primitive. We know
>> this since #squeakToUTF8, many thanks to Andreas.
>> To profit by buffering for decoder too, just use a message to wrap it up:
>>
>>   {
>>   [| tmp |
>>       tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
>> at: 2) name) ascii;
>>               wantsLineEndConversion: false; converter: UTF8TextConverter
>> new.
>>       1 to: 20000 do: [:i | tmp upTo: Character cr].
>>       tmp close] timeToRun.
>>   [| tmp |
>>       tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles at: 2)
>> name)
>>               readXtream binary buffered ~> UTF8Decoder) buffered.
>>       1 to: 20000 do: [:i | tmp upTo: Character cr].
>>       tmp close] timeToRun.
>>   }
>>  #(152 18)
>>  #(152 19)
>>
>> Bingo, now the speed up is there too! 7.5x is not a bad score afterall.
>> That's not amazing, the change log is essentially made of ASCII and
>> does rarely require any UTF8 translation at all.
>> Of course, if you handle files full of chinese code points, don't
>> expect a speed up at all!
>> But for a decent proportion of latin character users, the potential
>> speed-up is there, right under our Streams.
>
> This could also be handled with my MultiByteStream idea (mentioned above
> without the name).
>
> So what's the conclusion? Should we consider adding XTream to Squeak and
> evolving the system to use it instead of Streams?
>

I don't think it is possible yet. Stream has so many messages I'm not
sure I should reproduce in Xtream...
I just tried this,
WriteStream class>>on: aCollection
    self == WriteStream ifTrue: [^WriteXtream on: aCollection].
    ^super on: aCollection

My image did not survive :(

On the other hand, changing the whole hierarchy smoothly is a challenge too...

Nicolas

>
> Levente
>
>>
>> Nicolas (again)
>>
>>
>
>