Morphic, Dynapad, and Squeak performance

Tim Rowledge tim at sumeru.stanford.edu
Fri May 24 21:23:51 UTC 2002


Welcome to the debate....

Tommy Thorn <tt1729 at yahoo.com> is claimed by the authorities to have written:

> First of, I'm not sure that the ARM/StrongARM
> VM has really been carefully handtuned.  I know
> there are some gcc issues that affects performance.
> The first and easy step is attacking this with
> profiling tools and carefully handtweaking.  This
> most likely affect the portability of the VM
> (but not the image), but that's the first price
> to accept.
Virtually no handtweaking goes on with any of the vms - we'd generally
prefer to produce good portable code. There are a couple of minor fudges
than can be applied but most improvements have come about through
portable mechanisms. 
> 
> Secondly, it really not that hard to do make 
> the VM do dynamic native code generation for 
> frequently executed methods (I know Squeak does
> have a Jitter, but AFAIK is doesn't generate
> native code).
Huh? I don't think it would be properly called a jitter if it didn't.
But yes, there is a jitter for Squeak that has some good performance
benefits but it doesn't seem t get a great deal of use in practice. I
haven't yet made an ARM version for example.
> Even simple minded template
> based instantiation can improve performance
> by a factor of four and more can be had with
> varing degree of runtime optimization.
I think you're over simplifying a bit there; there are a huge number of
other areas that need improvements to get fourtimes speedups. As another
(even more) simplistic statement it is generally considered that
Smalltalk spends around half its time in primitives and thus even an
infinitely fast jit engine that produced infinitely fast code would only
double aggregate performance. 

> 
> Thirdly, the current implementation of message 
> sends is not exactly state of the art (although
> it has a lot of nice properties).
It's perfectly sensible in an interpreter world, which is what we're
living in now. You can't really do inline caching in a practical sense.
> If you admit
> to native JITing, then one option is to make 
> message sends implement their own cache in
> compiled form (that is, a cache miss could
> cause the recompilation of the send).  There
> are other less dramatic option: attaching
> the compiled cache to the class instead,
> or applying a global enumeration scheme
> to speed up method lookup for cache misses [1].
We've been doing this in the Smalltalk world for, ooh, twenty years or
so. I think the earlist paper I can think of would the the
Deutsch/Schiffman paper from '84 POPL.

> I'm convinced that it's technically feasible
> to make Squeak on StrongArm fast enough to make
> Morphic usable.  I'm just not sure if we have
> enough collective spare cycles to make it happen.
I'm fairly sure we could potentially get maybe four-five times faster
from a strongARM than we do right now. It just so happens that I have a
few years experience doing this sort of thing on ARMs (all the way back
to helping in the original cpu design) as well as an ARM VisualWorks
implementation. It's around five times faster than squeak.

The chief obstacle to speed for Smalltalk on ARM is the tiny cache size.
This severely restricts effective bandwidth, and as everybody knows I
keep saying you need three things for Smalltalk performance
memory bandwidth,
memory banhdwidth,
and err, oh, yes, memory bandwidth :-)

tim

-- 
Tim Rowledge, tim at sumeru.stanford.edu, http://sumeru.stanford.edu/tim
When a program is being tested, it is too late to make design changes.




More information about the Squeak-dev mailing list