Timers, Stepping, Co-routines, and Performance in Squeak.....
(was Re: Timers & Stepping)
Henrik.Gedenryd at lucs.lu.se
Wed Mar 22 19:34:20 UTC 2000
>> On my computer, 73% of the CPU time is taken up by GrafPort>>fillRect:.
>> Thus for the task of redrawing a typical display, less than 27% of the
>> time is spent executing bytecodes. (And it's probably not even that
>> much). From this quick test, the single biggest thing that could speed
>> up Morphic redisplays is an even faster BitBlt.
> Faster Blts are always good, but there's still plenty of room for cleverness.
> For instance, sparked by recent comments, I just made dragging windows go
> about twice as fast in morphic by being more careful about whether the dragee
> has holes or translucency (if not, you can use the considerably faster store
> mode of BitBlt). Coming soon to an update server near you.
I was just going to mention the value of really understanding BitBlt for
speeding things up--as graphics is often what makes things (seem) slow, and
this always comes down to BitBlt. I have a recent example from my own
When I did the sub-pixel rendering algorithm, I began by coding it as
described to me, applying the image processing on a per-pixel basis, looping
over the pixels. Then I changed this into a sequence of image processing
operations, each performed on all pixels before going on to the next
The upshot of this is that each such operation is a BitBlt. In other words I
used Bijan's strategy and got Dan to translate my inner loop to C for me.
> Get someone who knows what they're doing to make it work.
> Let that someone get a bunch of other folks make it work right and fast.
Instead of adding to each pixel a fraction of the pixel value to the left,
you blend the whole form with white and then copy it one pixel to the right;
etc. (The code is in the antialiasing change set on the Swiki.)
The result was eventually a speed increase of 700x (ie. 70000%). This with
zero lines of non-Smalltalk code. Then I changed one blt mode to handle also
an odd condition correctly, and it dropped to 600 something. Was the final
Smalltalk code difficult to read? Not really, and it is about 25 lines in
The last speed drop also taught me that the different modes differ greatly
in speed (as Dan noted wrt dragging windows). In general, the higher
numbered modes are slower.
In this solution I also included the trick suggested by Andreas, namely
merging the R G B values of every three pixels into one pixel, by just
applying WarpBlt (+ a trick) to shrink the width to a third. The trick was
just that every 32-bit pixel is made up by four 8 bit items. I still can't
believe that it could be made to work since there are alpha bytes inserted
between values. An optimized C version of this step would save four message
sends or so in total.
At this point I estimated that given the operations that would go away, an
optimal, quite difficult C rewrite would yield in the ballpark of twice the
The last trick shows the point that Blts can be used for other things than
pure graphics, when it comes to manipulating large amounts of simple data.
It's like a poor man's SIMD processor. I think it could make a very fast
occurrence count on a really long string with this mode:
> 33 tallyIntoMap: destinationWord. Tallies pixValues into a colorMap
Ok that's a very weird example, but setting the same huge string to all
ascii 32's is a more realistic one.
Then in the end I found that I wouldn't need to run this whenever drawing a
character to the screen, so all that speed wasn't strictly needed. This
illustrates the pitfall of premature optimization I'd say. But I learned a
lot about BitBlt and graphics. And it was fun.
If anything could be done to make Squeak faster, it would be to optimize the
BitBlt code further, but if I know Dan I don't think there's much room for
that. But it might be interesting to see this code compiled with the special
instructions of the PPC G4 and MMX etc, if feasible.
Jim Benson wrote:
> I have a third [monitor] for IRC and Mail so that I can read what Andreas or
Dan says as
> soon as I get it :-)
Jim, you should try this thing called "overlapping windows".
More information about the Squeak-dev