Thue-Morse and performance: Squeak v.s. Strongtalk v.s. VisualWorks

David Griswold david.griswold.256 at gmail.com
Mon Dec 18 00:48:30 UTC 2006


Hi Klaus,

On 12/17/06, Klaus D. Witzel <klaus.witzel at cobss.com> wrote:
>
> Thank you David for answering my question
>
> >> Can somebody reproduce the figures, any other results? Have I done
> >> something wrong?
>
> and thank you also for the explanations. I understand that PICs in
> Strongtalk are [in the current incarnation] limited to 4 entries, that's
> good to know.
>
> Just a minor adjustment: the #at: on the array was never in doubt and the
> integer loop was by intention because (I think) on all three systems it's
> compiled away already at the bytecode level and the #at: is expected to be
> subsummed at the primitive level. I've seen walkbacks in Strongtalk in
> which the source code #to:do: was inlined with #whileTrue sans block, like
> in Squeak.


Yes, #to:do: is treated specially by the bytecode compiler, although it
doesn't really have to be, since type-feedback would be able to inline and
eliminate the block.  The only reason it is treated specially is just so it
still runs reasonable fast in the interpreter, before methods are compiled,
because it is so important in inner loops.  #at:, on the other hand, is not
treated specially in Strongtalk, unlike most other Smalltalks.

As to you figures, will retry with a "warmer" image :)
>
> And I have nothing against people calling my test a poor benchmark. I
> wanted to compare the performance at this particular level and according
> to your report even there [the at this level unoptimized] Strongtalk is
> close to VW. And no, I would never say that mega-morphic sends is all what
> Smalltalk is about.
>
> Let me comment this one
>
> > ...  How much of your code really looks like that?
>
> Well, at that level almost all users of collection #do: look like that. I
> just made the level below an O(1) constant, otherwise the polymorphic
> nature of "(array at: i) doSomethingPolymorphically" would perhaps have
> gone unnoticed.


#do: loops are significantly different, because 1) they are not treated
specially by the bytecode compiler, so there is a real block and usually a
closure in most Smalltalks, 2) the implementation of #do:, which is where
the inner loop might be, does not literally contain the body of the loop, so
loop unrolling can't be applied by a non-inlining Smalltalk.  Array
bounds-check elimination might apply, but when the loop contains more than a
few sends (including the additional Block>>value: send), the benefits
rapidly become minor.

So in fact, a #do: benchmark (with a block that needs a closure, since all
real #do: sends need a closure) would be a much better benchmark, because
it's the way people actually write code, and sure enough Strongtalk can both
inline the #do: implementation, and inline the block into the loop, so it
would show much bigger advantages compared to other Smalltalks.  And even
that would understate the potential Strongtalk advantage, because if the
compiler was tuned, it would be able to do bounds-check elimination and loop
unrolling even for #do:, because it can inline the block, whereas
VisualWorks would never be able to.

Cheers,
Dave

Thanks again, very insightful.
>
> /Klaus
>
> On Mon, 18 Dec 2006 00:08:08 +0100, David Griswold
> <david.griswold.256 at gmail.com> wrote:
>
> > Klaus,
> >
> > There are three issues here:
> >
> > 1) You did *not* run it enough under Strongtalk to compile the
> > benchmark, so
> > you are measuring interpreted performance.  You need to run it until the
> > performance speeds up and stabilizes.   When it is compiled, on my
> > machine
> > (Sonoma Pentium M 1.7Ghz), Squeak 3.1 runs the benchmark in 60453, and
> > Strongtalk runs it in 22139.  That's not the latest Squeak but I doubt
> it
> > has changed much.  I don't have a recent VisualWorks installed, but from
> > my
> > knowledge of how the various systems work, I would expect VisualWorks to
> > be
> > a bit faster than Strongtalk at this (very poor) microbenchmark, for
> > reasons
> > explained below.
> >
> > 2) Andreas Raab was right in his comments.  The performance you are
> > measuring is *not* general Smalltalk performance, it is specifically the
> > performance of megamorphic sends, which are one of the few cases where
> > Strongtalk's type-feedback doesn't help at all.
> >
> > Here is how sends work in Strongtalk:
> >
> > Monomorphic and slightly polymorphic sends (1 or 2 receiver classes at
> > the
> > send site) can be inlined, which is the common case (over 90% of sends
> > fall
> > in this category), and that is where Strongtalk can give you big
> > speedups.
> >
> > Sends that have between 2 and 4 receiver classes are usually handled
> > with a
> > polymorphic inline cache (PIC), which is still a real dispatch and call,
> > and
> > is only slightly faster (if at all) than in other Smalltalks, since that
> > is
> > the most highly optimized piece of code in any normal Smalltalk
> > implementation.  PICs are not primarily for optimization; their real
> > role is
> > to gather type information for the inlining compiler.  Note that
> > VisualWorks
> > now has PICs, so it uses the same technology for non-inlined sends as
> > Strongtalk.
> >
> > Sends that have more than 4 receiver types, such as your
> micro-benchmark,
> > can't even use PICs or any kind of inline cache, so these are a full
> > megamorphic send in Strongtalk, which is implemented as an actual hashed
> > lookup, which is the slowest case of all.  You might say that is what
> > Smalltalk is all about, but in reality megamorphic sends are relatively
> > rare
> > as a percentage of sends.  Compilers aren't magic- no one can eliminate
> > the
> > fundamental computation that a truly megamorphic send has to do- it
> > *has* to
> > do some kind of real lookup, and a call, so the performance will
> > naturally
> > be similar across all Smalltalks.
> >
> > Every Smalltalk has that overhead.  What Strongtalk does is eliminate
> > that
> > overhead when you don't really need it, when a send doesn't actually
> have
> > many receiver classes.  That is what other Smalltalk's can't do: they
> > make
> > you pay the cost of a dispatch and call all the time, even if you don't
> > need
> > it, which is the common case.
> >
> > So your 'picBench' isn't even measuring PIC performance.
> >
> > 3) I would expect VisualWorks to be about the same speed or a bit faster
> > than Strongtalk on this atypical benchmark because of several factors.
> > We
> > have established that type-feedback doesn't help this benchmark, so from
> > the
> > point of view of sends, VisualWorks and Strongtalk would be doing
> > basically
> > the same kind of things.  The reason VisualWorks would probably be a bit
> > faster on this benchmark is because it probably does array bounds-check
> > elimination and maybe even loop unrolling, which aren't yet implemented
> > in
> > Strongtalk, and I'm sure aren't implemented in Squeak.  We did those in
> > the
> > Java VM, but hadn't yet gotten to that for Strongtalk; Strongtalk hasn't
> > even really been tuned, and VisualWorks has been tuned for many years.
> > Your
> > benchmark consists of a tight inner loop that does only two things: a
> > megamorphic send, and an array lookup.  So the array bounds check and
> > loop
> > overhead are a significant factor, and if VisualWorks can optimize
> > those, it
> > would make a real difference.
> >
> > But once again, this is not even remotely typical Smalltalk code.  Array
> > bounds-checks and loop unrolling are rarely used optimizations that
> > generally only help when you have a very tight inner loop that does
> > almost
> > nothing and where the loop itself is a literal SmallInteger>>to:do:
> send,
> > you are accessing an array, and the array access is literally imbedded
> in
> > the loop, not in a called method.  How much of your code really looks
> > like
> > that?
> >
> > -Dave
> >
> > On 12/17/06, Klaus D. Witzel <klaus.witzel at cobss.com> wrote:
> >>
> >> Folks,
> >>
> >> I'm sorry to tell that Strongtalk is NOT that fast. I followed the
> >> instructions and *compiled* the following benchmark in Strongtalk,
> >> evaluated the same expression in Squeak and in VW and got the these
> >> results on my 1.73GHz 1.0GB WinXP notebook:
> >>
> >> - VisualWorks:  16799 (N.C. 7.4.1)
> >> - Strongtalk:   47517 (1.1.2)
> >> - Squeak:               56726 (3.9#7056)
> >>
> >> Below is the Squeak/VW source code, attached is the Strongtalk source
> >> code. The test is simple: a long loop around a single polymorphic call
> >> site "(instances at: i) yourself", straight forward inlineable and with
> >> intentionally unpredictable type information at the call site (modeled
> >> after the Thue-Morse sequence).
> >>
> >> I'm disappointed, Strongtalk was always advertised as being the fastest
> >> Smalltalk available "...executes Smalltalk much faster than any other
> >> Smalltalk implementation...", and now it shows to be in almost the same
> >> class as Squeak is :) :(
> >>
> >> Can somebody reproduce the figures, any other results? Have I done
> >> something wrong?
> >>
> >> BTW: congrats to the implementors of Squeak and, of course, to Cincom!
> >> (uhm, and also to the Strongtalk team!)
> >>
> >> /Klaus
> >>
> >> --------------
> >>   | instances base |
> >>   base := (Array
> >>         with: OrderedCollection basicNew
> >>         with: SequenceableCollection basicNew
> >>         with: Collection basicNew
> >>         with: Object basicNew) ,
> >>         (Array
> >>         with: Character space
> >>         with: Date basicNew
> >>         with: Time basicNew
> >>         with: Magnitude basicNew).
> >>   instances := OrderedCollection with: (base at: 1).
> >>   2 to: base size do: [:i |
> >>    instances := instances , instances reverse.
> >>    instances addLast: (base at: i)].
> >>   instances := (instances , instances reverse) asArray.
> >>   ^ Time millisecondsToRun: [
> >>         1234567 timesRepeat: [
> >>                 1 to: instances size do: [:i |
> >>                         (instances at: i) yourself]]]
> >> --------------
> >>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20061217/15300a84/attachment.htm


More information about the Squeak-dev mailing list