[Newcompiler] A few thoughts on a bytecode set.

Tue May 1 21:04:53 UTC 2007

The bytecode set that I was thinking about. It's just enough of a
design to let me formulate ideas around is. I haven't looked at how
easy it would be to change the new compiler to generate it. This is
just me jotting down my thinking over the last few days.

The bytecodes are:
  createClosureEnvironment
  createBlockContext
  loadEnvironmentVariable
  storeEnvironmentVariable

loadEnvironmentVariable and storeEnvironmentVariable should
be placed in the two unused bytecodes to minimize decoding
effort. Both would take two arguments, the first specifying
which environment to get the variable from. The second specifying
the variables position in the context.

In interpreted code for these bytecodes should be a tiny loop.  The
reason to put these in the two free bytecodes is to minimize the
number of jumps required to decode. The interpreter spends most of
it's time waiting for branch mispredicts to recover. It seems sensible
to put the fast and common operations where they're fastest to decode.

It takes the interpreter about 10 clocks to execute a bytecode and 200
clocks for a send. (These numbers are old and may have changed
slightly due to both hardware changes and VM changes). After sorting
out context creation variable access may become significant for some
methods, the current closure compiler does reasonable job of using
receiver variable access where possible.

It's worthwhile thinking about how true (self) receiver variable
access would work in the worst cases where there's also environment
variable access. With environment access bytecodes it may be possible
to always use the receiver for the receiver. If possible normal
variable access should be done with unchecked fast bytecodes. Let the
compiler be responsible for checking that the variables exist.

The rest of the bytecodes would be retrofitted into the
doubleExtendedDoAnythingBytecode. Decode time doesn't really matter as
they're all performing slow operations. The double do anything
bytecode has a single byte argument available which should be enough
for createClosureEnvironment and createBlockContext given large
contexts only have 63 slots for arguments, temporaries, and the stack.

createClosureEnvironment would create the environment object and store
it in the activeContext. The decision to make this a single operation
was to minimize code size and make context creation as explicit as
possible. Making the VM as responsible as possible for contexts is
debatable. It's flexibility vs. compactness, explicitness, and speed.

createBlockContext does all the work required to create a context.

The two create* bytecodes are loosely based on bytecodePrimBlockCopy
which is sent to thisContext.

Speed is the weakest argument for combining all context creation into
a single bytecode. I think sends are going to be several times faster
than object creation given the speed gains from avoiding thisContext.
(I've broken that optimization myself during Exupery development,
that's how I know how important it is.)

Compactness may matter, it depends on how much we care about space.
Moderate compactness is good for performance as cache sizes are
limited.

Explicitness is helpful for bytecode readers. Even sending
"ClosureEnvironment new" can't be trusted 100% as
ClosureEnvironment>>new can be changed in the image.

When compiling the main thing that matters is being able to create
fast variable accesses. Given the above, and many other designs,
it should be easy to create variable accesses that take only one
instruction per context or environment object accessed. That is
an optimal design, faster access is possible with slower environment
or context creation.

When interpreting keeping the actual work required down is also
important but it's easy for fast operations to spend most time of
their time decoding the bytecodes.

Bryce