Thue-Morse and performance: Squeak v.s. Strongtalk v.s. VisualWorks

David Griswold david.griswold.256 at gmail.com
Sun Dec 17 23:08:08 UTC 2006


Klaus,

There are three issues here:

1) You did *not* run it enough under Strongtalk to compile the benchmark, so
you are measuring interpreted performance.  You need to run it until the
performance speeds up and stabilizes.   When it is compiled, on my machine
(Sonoma Pentium M 1.7Ghz), Squeak 3.1 runs the benchmark in 60453, and
Strongtalk runs it in 22139.  That's not the latest Squeak but I doubt it
has changed much.  I don't have a recent VisualWorks installed, but from my
knowledge of how the various systems work, I would expect VisualWorks to be
a bit faster than Strongtalk at this (very poor) microbenchmark, for reasons
explained below.

2) Andreas Raab was right in his comments.  The performance you are
measuring is *not* general Smalltalk performance, it is specifically the
performance of megamorphic sends, which are one of the few cases where
Strongtalk's type-feedback doesn't help at all.

Here is how sends work in Strongtalk:

Monomorphic and slightly polymorphic sends (1 or 2 receiver classes at the
send site) can be inlined, which is the common case (over 90% of sends fall
in this category), and that is where Strongtalk can give you big speedups.

Sends that have between 2 and 4 receiver classes are usually handled with a
polymorphic inline cache (PIC), which is still a real dispatch and call, and
is only slightly faster (if at all) than in other Smalltalks, since that is
the most highly optimized piece of code in any normal Smalltalk
implementation.  PICs are not primarily for optimization; their real role is
to gather type information for the inlining compiler.  Note that VisualWorks
now has PICs, so it uses the same technology for non-inlined sends as
Strongtalk.

Sends that have more than 4 receiver types, such as your micro-benchmark,
can't even use PICs or any kind of inline cache, so these are a full
megamorphic send in Strongtalk, which is implemented as an actual hashed
lookup, which is the slowest case of all.  You might say that is what
Smalltalk is all about, but in reality megamorphic sends are relatively rare
as a percentage of sends.  Compilers aren't magic- no one can eliminate the
fundamental computation that a truly megamorphic send has to do- it *has* to
do some kind of real lookup, and a call, so the performance will naturally
be similar across all Smalltalks.

Every Smalltalk has that overhead.  What Strongtalk does is eliminate that
overhead when you don't really need it, when a send doesn't actually have
many receiver classes.  That is what other Smalltalk's can't do: they make
you pay the cost of a dispatch and call all the time, even if you don't need
it, which is the common case.

So your 'picBench' isn't even measuring PIC performance.

3) I would expect VisualWorks to be about the same speed or a bit faster
than Strongtalk on this atypical benchmark because of several factors.  We
have established that type-feedback doesn't help this benchmark, so from the
point of view of sends, VisualWorks and Strongtalk would be doing basically
the same kind of things.  The reason VisualWorks would probably be a bit
faster on this benchmark is because it probably does array bounds-check
elimination and maybe even loop unrolling, which aren't yet implemented in
Strongtalk, and I'm sure aren't implemented in Squeak.  We did those in the
Java VM, but hadn't yet gotten to that for Strongtalk; Strongtalk hasn't
even really been tuned, and VisualWorks has been tuned for many years.  Your
benchmark consists of a tight inner loop that does only two things: a
megamorphic send, and an array lookup.  So the array bounds check and loop
overhead are a significant factor, and if VisualWorks can optimize those, it
would make a real difference.

But once again, this is not even remotely typical Smalltalk code.  Array
bounds-checks and loop unrolling are rarely used optimizations that
generally only help when you have a very tight inner loop that does almost
nothing and where the loop itself is a literal SmallInteger>>to:do: send,
you are accessing an array, and the array access is literally imbedded in
the loop, not in a called method.  How much of your code really looks like
that?

-Dave

On 12/17/06, Klaus D. Witzel <klaus.witzel at cobss.com> wrote:
>
> Folks,
>
> I'm sorry to tell that Strongtalk is NOT that fast. I followed the
> instructions and *compiled* the following benchmark in Strongtalk,
> evaluated the same expression in Squeak and in VW and got the these
> results on my 1.73GHz 1.0GB WinXP notebook:
>
> - VisualWorks:  16799 (N.C. 7.4.1)
> - Strongtalk:   47517 (1.1.2)
> - Squeak:               56726 (3.9#7056)
>
> Below is the Squeak/VW source code, attached is the Strongtalk source
> code. The test is simple: a long loop around a single polymorphic call
> site "(instances at: i) yourself", straight forward inlineable and with
> intentionally unpredictable type information at the call site (modeled
> after the Thue-Morse sequence).
>
> I'm disappointed, Strongtalk was always advertised as being the fastest
> Smalltalk available "...executes Smalltalk much faster than any other
> Smalltalk implementation...", and now it shows to be in almost the same
> class as Squeak is :) :(
>
> Can somebody reproduce the figures, any other results? Have I done
> something wrong?
>
> BTW: congrats to the implementors of Squeak and, of course, to Cincom!
> (uhm, and also to the Strongtalk team!)
>
> /Klaus
>
> --------------
>   | instances base |
>   base := (Array
>         with: OrderedCollection basicNew
>         with: SequenceableCollection basicNew
>         with: Collection basicNew
>         with: Object basicNew) ,
>         (Array
>         with: Character space
>         with: Date basicNew
>         with: Time basicNew
>         with: Magnitude basicNew).
>   instances := OrderedCollection with: (base at: 1).
>   2 to: base size do: [:i |
>    instances := instances , instances reverse.
>    instances addLast: (base at: i)].
>   instances := (instances , instances reverse) asArray.
>   ^ Time millisecondsToRun: [
>         1234567 timesRepeat: [
>                 1 to: instances size do: [:i |
>                         (instances at: i) yourself]]]
> --------------
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20061217/c11e4678/attachment.htm


More information about the Squeak-dev mailing list