Optimizing Squeak

Carl Watts Carl at AppliedThought.com
Tue Feb 23 16:42:26 UTC 1999


>The best alternative I came up with was what I'll call a message send
>bytecode set.
[snip]

Very interesting!

>The issue is the processor pipleine
>uses the address of the branch instruction (usually an unconditional jump
>to a table indexed address) to "predict" what address a branch will go to.

Oh, I didn't know that. Interesting...  It does this even for a "branch to address given in a register" instruction (which, I guess, is what it would be on a RISC processor)?

I had been under the mistaken impression that "branch prediction" meant "predicting whether or not a conditional branch will be taken" not "predicting which is the next instruction to execute after ANY branch so we can keep the pipeline filled".

Thanks for explaining that, Jan.

Then I read Tim Olson's further details:
>There are at least 2 kinds of Branch Target Caches:
>Branch Target Address Cache [BTAC] (P6, PowerPC 604e)
>     this cache is indexed by the address of the branch instruction, and
>     returns the address of the predicted target
>Branch Target Instruction Cache [BTIC] (AMD 29K, PowerPC 750)
>   this cache is indexed by the address of the branch target, and
>    returns the first few instructions at that target

So (let me summarize my understanding here), on current Intel chips (P6) (which uses BTAC) this branch misprediction (for a byte-code dispatch loop that used the bytecode to index into a 256 element table of addresses) WOULD have the bad branch misprediction behavior Jan takes about.

But current PPC's (which use BTIC) wouldn't have this problem in this kind of byte-code dispatch loop. In fact since the first few instructions to handle each bytecode would be cached, they should have really good pipelining behavior for this kind of byte-code dispatch loop.

FYI: The BrouHaHa interpreter which Eliot Miranda wrote used (if I remember correctly) what the referenced document (http://www.complang.tuwien.ac.at/forth/threaded-code.html) would have us call "direct token threaded code".  On current PPC chips this kind of byte-code interpreter should also perform well because of the PPC's BTIC.

Thanks for the additional details Tim!

Carl

P.S. Gee, following this thread is like taking a course in computer engineering!





More information about the Squeak-dev mailing list