Thue-Morse and performance: Squeak v.s. Strongtalk v.s. VisualWorks

bryce at kampjes.demon.co.uk bryce at kampjes.demon.co.uk
Mon Dec 18 23:55:56 UTC 2006


David Griswold writes:
 > Sends that have more than 4 receiver types, such as your micro-benchmark,
 > can't even use PICs or any kind of inline cache, so these are a full
 > megamorphic send in Strongtalk, which is implemented as an actual hashed
 > lookup, which is the slowest case of all.  You might say that is what
 > Smalltalk is all about, but in reality megamorphic sends are relatively rare
 > as a percentage of sends.  Compilers aren't magic- no one can eliminate the
 > fundamental computation that a truly megamorphic send has to do- it *has* to
 > do some kind of real lookup, and a call, so the performance will naturally
 > be similar across all Smalltalks.

I'm fairly sure that VisualWorks has a hash PIC that it uses for
mega-morphic sends. Eliot talked about this at Smalltalk Solutions.
I also doubt that VW does any advanced optimizations such as global
code motion (moving type-checks out of loops) or loop unrolling. If it
did it would be faster than Exupery for the bytecode benchmark.

However, in this case if you're actually compiling your benchmark in
Strongtalk it's possible that the performance difference between VW and
Strongtalk is the method specialization done by Strongtalk. 

Strongtalk, AFAIK, compiles a version for each receiver for a
method. This is an optimization because it allows more precise type
information to be gathered as it's not polluted by other classes use
of an inherited method. Specializing methods by receiver should also
allow faster inlining of self sends as they can be fully resolved at
compile time. (1)

Having a separate compiled method for every receiver may be doing bad
things to your CPU's instruction cache. That could be where
Strongtalk's lack of performance here is coming from. First level
instruction caches are small, the largest on a desktop CPU is only
64Kb. If you want to find out then it is possible to measure cache
misses unfortunately I only know how to do this under Linux.

Microbenchmarks are getting less reliable as compilers and hardware 
becomes smarter.

Bryce

(1) Exupery also compiles a version of each method for each
receiver. It does this to allow it to compile specialised versions of
the #at: and #new primitives. Specialising is often the right thing to
do, especially if you plan to inline methods. 

A fully tuned compiler might, but might not, only specialise methods
when it helps. However in general it may cost more to figure out when
it helps to specialise than it costs to always specialise. Without
extensive macro benchmarking it is dangerous to guess.



More information about the Squeak-dev mailing list