On the effect of branch mispredictions in the Squeak VM

Andreas Raab andreas.raab at gmx.de
Mon Jul 7 12:59:49 UTC 2003


Hi Tim,

> I suspect there will be an improvement, but not as much as in
> Andreas' results (was that a Pentium-4?) 

Originally, this was on a P2 but I found the improvement to be consistent on
all Pentium processors I've tried (though I have to admit I didn't try it
explicitly on P3 or P4).

> The PowerPC 750 (G3) was designed with
> this kind of table-driven dispatch code in mind, since that was the
> basis for the 68K emulator which Apple used.  The pipeline is only 5
> stages, so branch misprediction penalties are quite small compared to
> the 20 cycles of the P4.

Very interesting - do you have a URL handy for this? I'd like to read up on
it.

> The 750 does have a 64-entry BTIC (Branch Target Instruction 
> Cache), so it will see some benefit from the "goto indirect label"
> feature of GCC.

Which makes me wonder if the BTIC is actually large enough to cover the
whole dispatch of my benchmark (only 20 bytecodes are involved) ... perhaps
we need to increase the number of bytecodes to actually see an effect (it
would certainly explain why John got the same measures).

John, one question about your results: How do they measure up with the
results coming from #tinyBenchmarks? I don't think there are very many
bytecodes involved either so (in theory) the results obtained should be
exactly in line.

Cheers,
  - Andreas



More information about the Squeak-dev mailing list