More fun with VMs
Dan Ingalls
Dan at SqueakLand.org
Wed Mar 22 14:11:05 UTC 2006
Hi -
I've got my little Squeak in Java running (hope to send out a link
soon), and I've been pondering how to make it run faster. In the
process, I've thought of two techniques, one of which is new (to me)
and the other occurred to me years ago, but I never tried it out.
Since neither would really be all that hard to do in Squeak, I
thought I'd mention them here for those folks who delight in such
things, and with the further hope that someone might actually try
them out.
Lazy Activation
This was the next thing I was going to do for Apple Smalltalk back
when I got drafted to the hotel business back in 1987. The essence
of the idea is that the purpose of activating a context is to save
the execution state in case you have to do a send and, conversely,
you don't really need an activation if you never need to do a real
send.
I had a lot of fun instrumenting the VM to figure out just how many
activations could be avoided in this way, and my recollection is that
it was roughly 50%. I believe the statistics were better dynamically
than statically, because there are a lot of methods that, in general
need to be activated, but they may begin with a test such as
position > limit ifTrue: [^ false]
and for every time that this test succeeds, you can get away without
ever needing an activation.
But, you say, you still need a pointer to the method bytes and a
stack frame, and this is true, but you don't need to allocate and
initialize a full context, nor to transfer the arguments. The idea
is that, when you hit the send, you do the lookup, find the method,
and then jump to a *separate copy* of the interpreter that has a
different set of bytecode service routines. For instance, 'loadTemp'
will, depending on the argument count, load from the stack of the
calling method (which is still the "active" context). 'Push', since
there is no allocated stack, pushes into a static array and, eg,
'plus' does the same old add, but it gets its args from the static
array, and puts its result back there. And if anything fancy, such
as a real send, does occur, then a special routine is called to do a
real activation, copy this static state into it appropriately, and
retry the bytecode in the normal interpreter.
It's probably worth confirming the results that I remember, but I
wouldn't be surprised if one could almost double the speed of Squeak
in this manner.
Cloned Activation
This one I just thought of, but I can't believe someone hasn't
already tried it, either in squeak or some similar system. The idea
here is to provide a field in the method cache for an extra copy of a
properly initialized context for the method (ie, correct frame size,
method installed, pc and stack pointer set, etc). Then, when a send
occurs, all you have to do is an array copy into blank storage,
followed by a short copy of receiver and args from the calling stack.
There's a space penalty for carrying all the extra context templates,
of course, but I think it's not unreasonable. Also, one could avoid
it for all one-time code by only allocating the extra clone on the
second call (ie, first call gets it into the method cache; second
call allocates clone for the cache).
I have little sense of how much this might help these days -- I
haven't looked in detail at the activation code for quite a while.
Obviously the worse it si right now, the more this technique might
help.
Mainly I just like to think about this stuff, and it occurred to me
that, if someone were looking for a fun experiment or two, it might
turn out to have some practical value. I haven't looked at Exupery
to know whether these things are already being done, or whether they
might fit well with the other techniques there, but I'm sure Bryce
could say right off the bat.
- Dan
More information about the Squeak-dev
mailing list
|