Interpreter>>pushReceiverVariableBytecode
Tommy Thorn
thorn at meko.dk
Sat Sep 7 01:07:17 UTC 2002
Ian Piumarta wrote:
[lots of details on manually scheduling interpret to fetch the bytecode
ahead of time deleted]
All true for most (if not all?) modern processors, but it's really only
true because/if the compilers aren't clever enough.
It's possible for some architectures that I-cache pressure (ie, code
size) might matter more, in which case this is a loss.
>Because the speedup is measurable, _significantly_ so when using gcc in
>which case the final "break" in each bytecode is converted (manually, by
>an awk script run on the interp.c file) into an explicit dispatch directly
>to the next bytecode's case label
>
> void *bytecodeDisatchTable[256] = { &label0, ..., &label255 };
> ...
> case N: labelN:
> bytecode= fetchNextBytecode();
> doTheWork();
> goto *bytecodeDispatchTable[currentBytecode]; /* break */
>
>which entirely eliminates the interpreter's dispatch loop.
>
This is a standard GCC interpreter trick, but I _really_ think this is a
loss for the ARM because the switch is actually implemented very
efficiently on ARM with just one instruction using PC relative
addressing(*). Once you use computed goto's you need to hold
bytecodeDispatchTable in a register or (worse still) load the constand
each time. Does the saving of one (unconditional) jump back to the
interpreter loop pay off the added register pressure?
For the ARM, fetching the bytecode ahead of time barely hurts code size
and is probably a win or a wash.
Alas, I haven't had time to perform any detailed meassurements to answer
these questions.
/Tommy
(*): GCC also generates a useless bytecode < 256 comparison that should
be eliminated.
More information about the Squeak-dev
mailing list
|