[squeak-dev] Squeak XTream + COG

Fri Sep 10 22:41:13 UTC 2010

My own Squeak XTream development has got a bit frozen...
The arrival of COG is an excellent pretext to wake it up.

In short, the essence of this mail is that XTream performs well in COG
(as well as Stream or better).
Moreover it shows how some possible speed up are still possible for
files despite Levente Stream optimizations.
If interested, read the rest, if not apologies for the too long mail.

Nicolas

What Squeak XTream is not
======================

First, I'd like to renew my apologies to VW fellows (Michael Lucas
Smith & al) for hi-jacking the name.
Squeak XTream is not a port of VW XTream, though it share some ideas
and inspiration.
If someone wants to port VW XTream to Squeak, then I shall give the name back.
In the interim, I keep it, it's too nice (any idea welcome for a rename).

Squeak XTream is much less extreme than VW: for example, it preserve
most message names.
As so, it's just a possible replacement/clean-up of Squeak Stream.
Squeak XTream is also far less extended (no parser etc...).

What Squeak XTream is
===================

Did you ever browse Squeak Stream hierarchy? Then you know what Squeak
XTream is about!
It's about reducing complexity and increasing quality by using simple
uniform concepts.
Not sure the goal is reached yet, but we must keep it in mind.

The 1st idea behind Squeak XTream is to use Wrappers rather than subclasses.
Wrappers act somehow as filters with an input and an output (think of
a Unix pipe).

Note there are other alternatives like composition by Traits - see Nile.
Nile implements some Wrappers too though.

The 2nd idea is to separate ReadXtream and WriteXtream as much as possible.
Well Squeak XTream is not as extreme as VW with this respect too...
It still have a read/write stream subclass (not nice).
I think this was the main start point of Nile.

The 3rd idea is to uniformely provide #readInto:startingAt:count: and
#next:putAll:startingAt: API.
This is essential to increase the throughput when possible (this is
the well known buffering).

The 4th idea is to offer a parametric endOfStream handling for read stream.
It can be a simple ^nil, raising an Exception or evaluating a Block...
(anything responding to value).
In Squeak, streaming on a collection containing nil could be problematic.

The performances
==============

We have justified Squeak XTream by quality, but performances count too.
Essentially because Stream are used everywhere deep in the Kernel
operations (File read/write, character encoding/decoding,
Compiler/Parser, text processing etc...).

With traditional VM, Squeak XTream performs well, but for #next sends
because it does not implement a primitive.
Good news, as already indicated by Eliot, the #next #nextPut:
primitives are absolutely not necessary with COG, better throw them
out!

    {
    [| tmp |
        tmp := (String new: 10000) writeStream.
        1 to: 10000 do: [:i | tmp nextPut: $0]] bench.
    [| tmp |
        tmp := (String new: 10000) writeXtream.
        1 to: 10000 do: [:i | tmp nextPut: $0]] bench.
    }
    #('1,200 per second.' '1,310 per second.')
    #('1,180 per second.' '1,320 per second.')

    {
    [| tmp |
        tmp := (String new: 10000 withAll: $0) readStream.
        1 to: 10000 do: [:i | tmp next]] bench.
    [| tmp |
        tmp := (String new: 10000 withAll: $0) readXtream.
        1 to: 10000 do: [:i | tmp next]] bench.
    }
   #('2,470 per second.' '2,470 per second.')
   #('2,490 per second.' '2,480 per second.')

This now makes XTream performance similar to Stream for every current message.
This includes performances on all kind of simple loops.
    [ tmp := aStream next. tmp==nil] whileFalse: [ tmp doSomething ].
    [ aStream atEnd] whileFalse: [ aStream next doSomething ].
    aStream do: [:next | next doSomething ].
Though XTream adds one more possibility.
    aStream endOfStreamAction: [ ^nil].
    [ aStream next doSomething. true ] whileTrue. "Don't use repeat in
Squeak, it's not inlined"

    | str |
    str := String new: 1000 withAll: $a.
    {
        [| tmp | tmp := str readStream. [tmp next==nil] whileFalse] bench.
        [| tmp | tmp := str readXtream. [tmp next==nil] whileFalse] bench.
    }
    #('34,800 per second.' '37,000 per second.')
    #('36,000 per second.' '37,000 per second.')

    | str |
    str := String new: 1000 withAll: $a.
    {
        [tmp := str readStream. [tmp atEnd] whileFalse: [tmp next]] bench.
        [tmp := str readXtream. [tmp atEnd] whileFalse: [tmp next]] bench.
    }
    #('27,000 per second.' '27,500 per second.')
    #('26,600 per second.' '27,100 per second.')

This also is the case of major messages upTo: upToAnyOf: nextPutAll: etc...

    | str |
    str := String new: 1000 withAll: $a.
    {
        [str readStream upTo: $b] bench.
        [str readXtream upTo: $b] bench.
    }
    #('294,000 per second.' '297,000 per second.')
    #('296,000 per second.' '293,000 per second.')

Now, what about Files ?
Squeak FileStream has recently (4.1) known major speed up thanks to
the hard work of Levente which backported several experiments to
Squeak Stream hierarchy with 100% backward compatibility and smooth
transition. Bravo!
This does naturally reduce one of the advantage of Squeak XTream,
which was buffering optimizations.

    {
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name).
        [tmp next==nil] whileFalse. tmp close] timeToRun.
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
2) name) readXtream buffered.
        [tmp next==nil] whileFalse. tmp close] timeToRun.
    }
    #(1497 1164)
    #(1426 1132)

The speed up is not null but not major neither.
Though, some old Stream messages still deserve optimization as
demonstrated here:

    {
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
2) name) ascii.
        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close] timeToRun.
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed:
        (SourceFiles at: 2) name) readXtream ascii buffered.
        [tmp upTo: Character cr. tmp atEnd] whileFalse. tmp close] timeToRun.
    }
    #(8854 716)
    #(8859 716)

    {
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
2) name) ascii.
        [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse.
        tmp close] timeToRun.
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at:
2) name) readXtream ascii buffered.
        [tmp upToAnyOf: (CharacterSet crlf). tmp atEnd] whileFalse.
        tmp close] timeToRun.
    }
    #(9138 857)
    #(9157 837)

One more important subject is the MultiByteFileStream bottleneck.
Internationalisation is an essential feature, many thanks to Yoshiki
for bringing it alive.
But the performance price is high.
Also, this is a place where Squeak dramatically require clean-ups (you
know all the basicNext and the like are just hackish).
Now, once again, since Levente buffering, the difference is not that high:

    {
    [| tmp |
        tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
at: 2) name) ascii;
                wantsLineEndConversion: false; converter: UTF8TextConverter new.
        1 to: 20000 do: [:i | tmp upTo: Character cr].
        tmp close] timeToRun.
    [| tmp |
        tmp := (StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name)
                readXtream binary buffered ~> UTF8Decoder.
        1 to: 20000 do: [:i | tmp upTo: Character cr].
        tmp close] timeToRun.
    }
    #(152 120 )
    #(150 119 )

But wait, the file was buffered (bytes are fetched from file by packets),
but the decoder was not! All decoding is performed char by char.
That's bad, because when only a few bytes require decoding and
majority can be translated unchanged to String, there is potentially a
major speed up by simple using a sub-array copy primitive. We know
this since #squeakToUTF8, many thanks to Andreas.
To profit by buffering for decoder too, just use a message to wrap it up:

    {
    [| tmp |
        tmp := (MultiByteFileStream readOnlyFileNamed: (SourceFiles
at: 2) name) ascii;
                wantsLineEndConversion: false; converter: UTF8TextConverter new.
        1 to: 20000 do: [:i | tmp upTo: Character cr].
        tmp close] timeToRun.
    [| tmp |
        tmp := ((StandardFileStream readOnlyFileNamed: (SourceFiles at: 2) name)
                readXtream binary buffered ~> UTF8Decoder) buffered.
        1 to: 20000 do: [:i | tmp upTo: Character cr].
        tmp close] timeToRun.
    }
   #(152 18)
   #(152 19)

Bingo, now the speed up is there too! 7.5x is not a bad score afterall.
That's not amazing, the change log is essentially made of ASCII and
does rarely require any UTF8 translation at all.
Of course, if you handle files full of chinese code points, don't
expect a speed up at all!
But for a decent proportion of latin character users, the potential
speed-up is there, right under our Streams.

Nicolas (again)