VM tuning ideas for you VM hackers

Raab, Andreas Andreas.Raab at disney.com
Wed Nov 17 21:10:39 UTC 1999


Jan,

> I noticed all the bytecodes are "unrolled" to unique chunks 
> of code. It seems possible things may be faster if things were combined. 
> What I mean is things like the 16 push temp bytecode get rolled back into 
> one chunk of code that extracts the offset from the opcode. The reason I 
> think the may be faster is if you have two consecutive push temp
bytecodes, 
> the second one should get a branch prediction hit.

I don't know about the push byte codes but for the common selector sends I
found that unrolling the case statement resulted in 10% performance
IMPROVEMENT over the non-unrolled version. It seems that the overhead
associated with extracting the actual number of arguments and the selector
index is way more expensive than the branch misprediction penalty. It could
be different for the push byte codes (since there is less work to do).

  Andreas





More information about the Squeak-dev mailing list