[squeak-dev] Squeak XTream + COG

Levente Uzonyi leves at elte.hu
Sun Sep 12 03:22:22 UTC 2010


On Sat, 11 Sep 2010, Nicolas Cellier wrote:

> My own Squeak XTream development has got a bit frozen...
> The arrival of COG is an excellent pretext to wake it up.
>
> In short, the essence of this mail is that XTream performs well in COG
> (as well as Stream or better).
> Moreover it shows how some possible speed up are still possible for
> files despite Levente Stream optimizations.
> If interested, read the rest, if not apologies for the too long mail.
>
> Nicolas
>
>
>
> What Squeak XTream is not
> ======================
>
> First, I'd like to renew my apologies to VW fellows (Michael Lucas
> Smith & al) for hi-jacking the name.
> Squeak XTream is not a port of VW XTream, though it share some ideas
> and inspiration.
> If someone wants to port VW XTream to Squeak, then I shall give the name back.
> In the interim, I keep it, it's too nice (any idea welcome for a rename).
>
> Squeak XTream is much less extreme than VW: for example, it preserve
> most message names.
> As so, it's just a possible replacement/clean-up of Squeak Stream.
> Squeak XTream is also far less extended (no parser etc...).
>
> What Squeak XTream is
> ===================
>
> Did you ever browse Squeak Stream hierarchy? Then you know what Squeak
> XTream is about!
> It's about reducing complexity and increasing quality by using simple
> uniform concepts.
> Not sure the goal is reached yet, but we must keep it in mind.
>
> The 1st idea behind Squeak XTream is to use Wrappers rather than subclasses.
> Wrappers act somehow as filters with an input and an output (think of
> a Unix pipe).
>
> Note there are other alternatives like composition by Traits - see Nile.
> Nile implements some Wrappers too though.
>
> The 2nd idea is to separate ReadXtream and WriteXtream as much as possible.
> Well Squeak XTream is not as extreme as VW with this respect too...
> It still have a read/write stream subclass (not nice).
> I think this was the main start point of Nile.
>
> The 3rd idea is to uniformely provide #readInto:startingAt:count: and
> #next:putAll:startingAt: API.
> This is essential to increase the throughput when possible (this is
> the well known buffering).
>
> The 4th idea is to offer a parametric endOfStream handling for read stream.
> It can be a simple ^nil, raising an Exception or evaluating a Block...
> (anything responding to value).
> In Squeak, streaming on a collection containing nil could be problematic.
>
> The performances
> ==============
>
> We have justified Squeak XTream by quality, but performances count too.
> Essentially because Stream are used everywhere deep in the Kernel
> operations (File read/write, character encoding/decoding,
> Compiler/Parser, text processing etc...).
>
> With traditional VM, Squeak XTream performs well, but for #next sends
> because it does not implement a primitive.
> Good news, as already indicated by Eliot, the #next #nextPut:
> primitives are absolutely not necessary with COG, better throw them
> out!

That's right, but we should keep them, because the SqueakVM is still 
the most widely used VM and that's significantly slower without the 
primitives.

>
>    {
>    [| tmp |
>        tmp := (String new: 10000) writeStream.
>        1 to: 10000 do: [:i | tmp nextPut: $0]] bench.
>    [| tmp |
>        tmp := (String new: 10000) writeXtream.
>        1 to: 10000 do: [:i | tmp nextPut: $0]] bench.
>    }
>    #('1,200 per second.' '1,310 per second.')
>    #('1,180 per second.' '1,320 per second.')

XTreams are a bit slower if you add a WideCharacter to the stream, because 
ByteString is swapped to WideString with #becomeForward:, while 
WriteStream >> #nextPut: has a string specific hack for this.

>
>    {
>    [| tmp |
>        tmp := (String new: 10000 withAll: $0) readStream.
>        1 to: 10000 do: [:i | tmp next]] bench.
>    [| tmp |
>        tmp := (String new: 10000 withAll: $0) readXtream.
>        1 to: 10000 do: [:i | tmp next]] bench.
>    }
>   #('2,470 per second.' '2,470 per second.')
>   #('2,490 per second.' '2,480 per second.')
>
>
> This now makes XTream performance similar to Stream for every current message.
> This includes performances on all kind of simple loops.
>    [ tmp := aStream next. tmp==nil] whileFalse: [ tmp doSomething ].
>    [ aStream atEnd] whileFalse: [ aStream next doSomething ].
>    aStream do: [:next | next doSomething ].
> Though XTream adds one more possibility.
>    aStream endOfStreamAction: [ ^nil].
>    [ aStream next doSomething. true ] whileTrue. "Don't use repeat in
> Squeak, it's not inlined"
>
>    | str |
>    str := String new: 1000 withAll: $a.
>    {
>        [| tmp | tmp := str readStream. [tmp next==nil] whileFalse] bench.
>        [| tmp | tmp := str readXtream. [tmp next==nil] whileFalse] bench.
>    }
>    #('34,800 per second.' '37,000 per second.')
>    #('36,000 per second.' '37,000 per second.')
>
>    | str |
>    str := String new: 1000 withAll: $a.
>    {
>        [tmp := str readStream. [tmp atEnd] whileFalse: [tmp next]] bench.
>        [tmp := str readXtream. [tmp atEnd] whileFalse: [tmp next]] bench.
>    }
>    #('27,000 per second.' '27,500 per second.')
>    #('26,600 per second.' '27,100 per second.')
>
> This also is the case of major messages upTo: upToAnyOf: nextPutAll: etc...
>
>    | str |
>    str := String new: 1000 withAll: $a.
>    {
>        [str readStream upTo: $b] bench.
>        [str readXtream upTo: $b] bench.
>    }
>    #('294,000 per second.' '297,000 per second.')
>    #('296,000 per second.' '293,000 per second.')
>
> Now, what about Files ?
> Squeak FileStream has recently (4.1) known major speed up thanks to
> the hard work of Levente which backported several experiments to
> Squeak Stream hierarchy with 100% backward compatibility and smooth
> transition. Bravo!
> This does naturally reduce one of the advantage of Squeak XTream,
> which was buffering optimizations.
>
>    {
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name).
>        [tmp next==nil] whileFalse. tmp close] timeToRun.
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
> 2) name) readXtream buffered.
>        [tmp next==nil] whileFalse. tmp close] timeToRun.
>    }
>    #(1497 1164)
>    #(1426 1132)
>
> The speed up is not null but not major neither.

If you use #basicNext instead of #next for StandardFileStream and remove 
the primitive from it, the performance will be the same. So getting rid of 
the #basic* methods can give us a bit more speed besides cleaner code.
I have an idea to implement this, but I'm not sure the transition to the 
new code would be smooth enough. I'm also not sure it's worth putting more 
effort in this, using another stream implementation may be a better idea.

> Though, some old Stream messages still deserve optimization as
> demonstrated here:
>
>    {
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
> 2) name) ascii.
>        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close] timeToRun.
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed:
>        (SourceFiles at: 2) name) readXtream ascii buffered.
>        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close] timeToRun.
>    }
>    #(8854 716)
>    #(8859 716)
>
>    {
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
> 2) name) ascii.
>        [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse.
>        tmp close] timeToRun.
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
> 2) name) readXtream ascii buffered.
>        [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse.
>        tmp close] timeToRun.
>    }
>    #(9138 857)
>    #(9157 837)

So far I couldn't find the real cause of this difference, maybe the vm 
profiler will tell more. MessageTally doesn't seem to be useful.

>
> One more important subject is the MultiByteFileStream bottleneck.
> Internationalisation is an essential feature, many thanks to Yoshiki
> for bringing it alive.
> But the performance price is high.
> Also, this is a place where Squeak dramatically require clean-ups (you
> know all the basicNext and the like are just hackish).
> Now, once again, since Levente buffering, the difference is not that high:
>
>    {
>    [| tmp |
>        tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii;
>                wantsLineEndConversion: false; converter: UTF8TextConverter new.
>        1 to: 20000 do: [:i | tmp upTo: Character cr].
>        tmp close] timeToRun.
>    [| tmp |
>        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name)
>                readXtream binary buffered ~> UTF8Decoder.
>        1 to: 20000 do: [:i | tmp upTo: Character cr].
>        tmp close] timeToRun.
>    }
>    #(152 120 )
>    #(150 119 )
>
> But wait, the file was buffered (bytes are fetched from file by packets),
> but the decoder was not! All decoding is performed char by char.
> That's bad, because when only a few bytes require decoding and
> majority can be translated unchanged to String, there is potentially a
> major speed up by simple using a sub-array copy primitive. We know
> this since #squeakToUTF8, many thanks to Andreas.
> To profit by buffering for decoder too, just use a message to wrap it up:
>
>    {
>    [| tmp |
>        tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
> at: 2) name) ascii;
>                wantsLineEndConversion: false; converter: UTF8TextConverter new.
>        1 to: 20000 do: [:i | tmp upTo: Character cr].
>        tmp close] timeToRun.
>    [| tmp |
>        tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name)
>                readXtream binary buffered ~> UTF8Decoder) buffered.
>        1 to: 20000 do: [:i | tmp upTo: Character cr].
>        tmp close] timeToRun.
>    }
>   #(152 18)
>   #(152 19)
>
> Bingo, now the speed up is there too! 7.5x is not a bad score afterall.
> That's not amazing, the change log is essentially made of ASCII and
> does rarely require any UTF8 translation at all.
> Of course, if you handle files full of chinese code points, don't
> expect a speed up at all!
> But for a decent proportion of latin character users, the potential
> speed-up is there, right under our Streams.

This could also be handled with my MultiByteStream idea (mentioned above 
without the name).

So what's the conclusion? Should we consider adding XTream to Squeak and 
evolving the system to use it instead of Streams?


Levente

>
> Nicolas (again)
>
>



More information about the Squeak-dev mailing list