Thue-Morse and performance: Squeak v.s. Strongtalk v.s. VisualWorks

Klaus D. Witzel klaus.witzel at cobss.com
Sun Dec 17 23:46:15 UTC 2006


Thank you David for answering my question

>> Can somebody reproduce the figures, any other results? Have I done
>> something wrong?

and thank you also for the explanations. I understand that PICs in  
Strongtalk are [in the current incarnation] limited to 4 entries, that's  
good to know.

Just a minor adjustment: the #at: on the array was never in doubt and the  
integer loop was by intention because (I think) on all three systems it's  
compiled away already at the bytecode level and the #at: is expected to be  
subsummed at the primitive level. I've seen walkbacks in Strongtalk in  
which the source code #to:do: was inlined with #whileTrue sans block, like  
in Squeak.

As to you figures, will retry with a "warmer" image :)

And I have nothing against people calling my test a poor benchmark. I  
wanted to compare the performance at this particular level and according  
to your report even there [the at this level unoptimized] Strongtalk is  
close to VW. And no, I would never say that mega-morphic sends is all what  
Smalltalk is about.

Let me comment this one

> ...  How much of your code really looks like that?

Well, at that level almost all users of collection #do: look like that. I  
just made the level below an O(1) constant, otherwise the polymorphic  
nature of "(array at: i) doSomethingPolymorphically" would perhaps have  
gone unnoticed.

Thanks again, very insightful.

/Klaus

On Mon, 18 Dec 2006 00:08:08 +0100, David Griswold  
<david.griswold.256 at gmail.com> wrote:

> Klaus,
>
> There are three issues here:
>
> 1) You did *not* run it enough under Strongtalk to compile the  
> benchmark, so
> you are measuring interpreted performance.  You need to run it until the
> performance speeds up and stabilizes.   When it is compiled, on my  
> machine
> (Sonoma Pentium M 1.7Ghz), Squeak 3.1 runs the benchmark in 60453, and
> Strongtalk runs it in 22139.  That's not the latest Squeak but I doubt it
> has changed much.  I don't have a recent VisualWorks installed, but from  
> my
> knowledge of how the various systems work, I would expect VisualWorks to  
> be
> a bit faster than Strongtalk at this (very poor) microbenchmark, for  
> reasons
> explained below.
>
> 2) Andreas Raab was right in his comments.  The performance you are
> measuring is *not* general Smalltalk performance, it is specifically the
> performance of megamorphic sends, which are one of the few cases where
> Strongtalk's type-feedback doesn't help at all.
>
> Here is how sends work in Strongtalk:
>
> Monomorphic and slightly polymorphic sends (1 or 2 receiver classes at  
> the
> send site) can be inlined, which is the common case (over 90% of sends  
> fall
> in this category), and that is where Strongtalk can give you big  
> speedups.
>
> Sends that have between 2 and 4 receiver classes are usually handled  
> with a
> polymorphic inline cache (PIC), which is still a real dispatch and call,  
> and
> is only slightly faster (if at all) than in other Smalltalks, since that  
> is
> the most highly optimized piece of code in any normal Smalltalk
> implementation.  PICs are not primarily for optimization; their real  
> role is
> to gather type information for the inlining compiler.  Note that  
> VisualWorks
> now has PICs, so it uses the same technology for non-inlined sends as
> Strongtalk.
>
> Sends that have more than 4 receiver types, such as your micro-benchmark,
> can't even use PICs or any kind of inline cache, so these are a full
> megamorphic send in Strongtalk, which is implemented as an actual hashed
> lookup, which is the slowest case of all.  You might say that is what
> Smalltalk is all about, but in reality megamorphic sends are relatively  
> rare
> as a percentage of sends.  Compilers aren't magic- no one can eliminate  
> the
> fundamental computation that a truly megamorphic send has to do- it  
> *has* to
> do some kind of real lookup, and a call, so the performance will  
> naturally
> be similar across all Smalltalks.
>
> Every Smalltalk has that overhead.  What Strongtalk does is eliminate  
> that
> overhead when you don't really need it, when a send doesn't actually have
> many receiver classes.  That is what other Smalltalk's can't do: they  
> make
> you pay the cost of a dispatch and call all the time, even if you don't  
> need
> it, which is the common case.
>
> So your 'picBench' isn't even measuring PIC performance.
>
> 3) I would expect VisualWorks to be about the same speed or a bit faster
> than Strongtalk on this atypical benchmark because of several factors.   
> We
> have established that type-feedback doesn't help this benchmark, so from  
> the
> point of view of sends, VisualWorks and Strongtalk would be doing  
> basically
> the same kind of things.  The reason VisualWorks would probably be a bit
> faster on this benchmark is because it probably does array bounds-check
> elimination and maybe even loop unrolling, which aren't yet implemented  
> in
> Strongtalk, and I'm sure aren't implemented in Squeak.  We did those in  
> the
> Java VM, but hadn't yet gotten to that for Strongtalk; Strongtalk hasn't
> even really been tuned, and VisualWorks has been tuned for many years.   
> Your
> benchmark consists of a tight inner loop that does only two things: a
> megamorphic send, and an array lookup.  So the array bounds check and  
> loop
> overhead are a significant factor, and if VisualWorks can optimize  
> those, it
> would make a real difference.
>
> But once again, this is not even remotely typical Smalltalk code.  Array
> bounds-checks and loop unrolling are rarely used optimizations that
> generally only help when you have a very tight inner loop that does  
> almost
> nothing and where the loop itself is a literal SmallInteger>>to:do: send,
> you are accessing an array, and the array access is literally imbedded in
> the loop, not in a called method.  How much of your code really looks  
> like
> that?
>
> -Dave
>
> On 12/17/06, Klaus D. Witzel <klaus.witzel at cobss.com> wrote:
>>
>> Folks,
>>
>> I'm sorry to tell that Strongtalk is NOT that fast. I followed the
>> instructions and *compiled* the following benchmark in Strongtalk,
>> evaluated the same expression in Squeak and in VW and got the these
>> results on my 1.73GHz 1.0GB WinXP notebook:
>>
>> - VisualWorks:  16799 (N.C. 7.4.1)
>> - Strongtalk:   47517 (1.1.2)
>> - Squeak:               56726 (3.9#7056)
>>
>> Below is the Squeak/VW source code, attached is the Strongtalk source
>> code. The test is simple: a long loop around a single polymorphic call
>> site "(instances at: i) yourself", straight forward inlineable and with
>> intentionally unpredictable type information at the call site (modeled
>> after the Thue-Morse sequence).
>>
>> I'm disappointed, Strongtalk was always advertised as being the fastest
>> Smalltalk available "...executes Smalltalk much faster than any other
>> Smalltalk implementation...", and now it shows to be in almost the same
>> class as Squeak is :) :(
>>
>> Can somebody reproduce the figures, any other results? Have I done
>> something wrong?
>>
>> BTW: congrats to the implementors of Squeak and, of course, to Cincom!
>> (uhm, and also to the Strongtalk team!)
>>
>> /Klaus
>>
>> --------------
>>   | instances base |
>>   base := (Array
>>         with: OrderedCollection basicNew
>>         with: SequenceableCollection basicNew
>>         with: Collection basicNew
>>         with: Object basicNew) ,
>>         (Array
>>         with: Character space
>>         with: Date basicNew
>>         with: Time basicNew
>>         with: Magnitude basicNew).
>>   instances := OrderedCollection with: (base at: 1).
>>   2 to: base size do: [:i |
>>    instances := instances , instances reverse.
>>    instances addLast: (base at: i)].
>>   instances := (instances , instances reverse) asArray.
>>   ^ Time millisecondsToRun: [
>>         1234567 timesRepeat: [
>>                 1 to: instances size do: [:i |
>>                         (instances at: i) yourself]]]
>> --------------
>>





More information about the Squeak-dev mailing list