More fun with VMs

Wed Mar 22 14:11:05 UTC 2006

Hi -

I've got my little Squeak in Java running (hope to send out a link 
soon), and I've been pondering how to make it run faster.  In the 
process, I've thought of two techniques, one of which is new (to me) 
and the other occurred to me years ago, but I never tried it out.

Since neither would really be all that hard to do in Squeak, I 
thought I'd mention them here for those folks who delight in such 
things, and with the further hope that someone might actually try 
them out.

Lazy Activation
This was the next thing I was going to do for Apple Smalltalk back 
when I got drafted to the hotel business back in 1987.  The essence 
of the idea is that the purpose of activating a context is to save 
the execution state in case you have to do a send and, conversely, 
you don't really need an activation if you never need to do a real 
send.

I had a lot of fun instrumenting the VM to figure out just how many 
activations could be avoided in this way, and my recollection is that 
it was roughly 50%.  I believe the statistics were better dynamically 
than statically, because there are a lot of methods that, in general 
need to be activated, but they may begin with a test such as
	position > limit ifTrue: [^ false]
and for every time that this test succeeds, you can get away without 
ever needing an activation.

But, you say, you still need a pointer to the method bytes and a 
stack frame, and this is true, but you don't need to allocate and 
initialize a full context, nor to transfer the arguments.  The idea 
is that, when you hit the send, you do the lookup, find the method, 
and then jump to a *separate copy* of the interpreter that has  a 
different set of bytecode service routines.  For instance, 'loadTemp' 
will, depending on the argument count, load from the stack of the 
calling method (which is still the "active" context).  'Push', since 
there is no allocated stack, pushes into a static array and, eg, 
'plus' does the same old add, but it gets its args from the static 
array, and puts its result back there.  And if anything fancy, such 
as a real send, does occur, then a special routine is called to do a 
real activation, copy this static state into it appropriately, and 
retry the bytecode in the normal interpreter.

It's probably worth confirming the results that I remember, but I 
wouldn't be surprised if one could almost double the speed of Squeak 
in this manner.

Cloned Activation
This one I just thought of, but I can't believe someone hasn't 
already tried it, either in squeak or some similar system.  The idea 
here is to provide a field in the method cache for an extra copy of a 
properly initialized context for the method (ie, correct frame size, 
method installed, pc and stack pointer set, etc).  Then, when a send 
occurs, all you have to do is an array copy into blank storage, 
followed by a short copy of receiver and args from the calling stack.

There's a space penalty for carrying all the extra context templates, 
of course, but I think it's not unreasonable.  Also, one could avoid 
it for all one-time code by only allocating the extra clone on the 
second call (ie, first call gets it into the method cache; second 
call allocates clone for the cache).

I have little sense of how much this might help these days -- I 
haven't looked in detail at the activation code for quite a while. 
Obviously the worse it si right now, the more this technique might 
help.

Mainly I just like to think about this stuff, and it occurred to me 
that, if someone were looking for a fun experiment or two, it might 
turn out to have some practical value.  I haven't looked at Exupery 
to know whether these things are already being done, or whether they 
might fit well with the other techniques there, but I'm sure Bryce 
could say right off the bat.

	- Dan