Context Stack Speedup
ajh18 at cornell.edu
Wed Apr 16 05:23:14 UTC 2003
Ok, you guys have convinced me that it will be too painful to wait for
Jitter while taking on a 15% slowdown. So I propose we add stack
enhancements only, which should give us a small net speed up. This will
allow us to consider bytecodes and other image format changes separately
without the urgency to counteract the closure slowdown. Besides
separating the stack project from the bytecode project makes things more
modular and easier to manage.
Below is the stack design I propose. I hope this triggers other
proposals and discussions that eventually leads to a final design so we
can implement it.
First let me say that this design adds a new optional kernel class to
the image making images that use it incompatible with older VMs, but the
new VM will still be able to run older images and load old projects
(image segments). So it is basically a backwards-compatible image
format change. The reason I'm sacrificing forward compatibility is to
make the design simpler and more object-oriented from my point of view
of no distinction between the image and the VM.
New Context Class
A ContextStack is a sequence of method contexts embedded in its
indexable fields (its stack). ContextStacks, MethodContexts, and
BlockContexts are chained together forming a full execution stack. When
the top context within a ContextStack is accessed it is popped out into
its own MethodContext and kept in the chain at its original position.
For example, suppose context A is a suspended context with B, C and D as
its senders in that order, and B, C and D all resides within the same
ContextStack Z. Sending #sender to A will cause B to be popped out into
its own context and Z would only have C and D remaining in it. If we
continue sending #sender until we reach the end, all will be in their
own context and no ContextStack will remain.
So anytime we need an object reference to a context it is separated out
into its own context. So, the debugger, etc. will never see
ContextStacks since their frames are converted to contexts as they are
accessed (through #sender).
The VM will maintain contexts inside a single ContextStack, unless it
executes pushThisContext then it will separate out the current context
into its own context and start a new ContextStack after it. Block
contexts and home contexts of blocks that return to it (^) also
execute pushThisContext so they will have their own contexts as well.
Thats basically it. I like this design is much better than converting
back and forth between the image representation and the VM
representation. I know this does not afford us to take advantage of the
C stack or raw data in the stack, but I think simplicity and
all-objects-all-the-time is more important then the extra speedup those
tricks would give us. Just the fact that we no longer have to move
receiver and args to new contexts and flush them afterwards each time
for future reuse will speed things up enough. That's how VI4 worked and
it made sends 100% faster.
VI4 also included bytecodes that accessed temps from sp offset instead
of fp offset. These were faster because sp was in a register but fp was
not. Maybe we can include fp in a register too so the standard
bytecodes can be just as fast. If not I still expect the stack design
presented above to improve sends by at least 50-60%, which should
translate to at least 15% macro speedup and thus offset the closure
More information about the Squeak-dev