Optimizing Squeak
Carl Watts
Carl at AppliedThought.com
Tue Feb 23 16:42:26 UTC 1999
>The best alternative I came up with was what I'll call a message send
>bytecode set.
[snip]
Very interesting!
>The issue is the processor pipleine
>uses the address of the branch instruction (usually an unconditional jump
>to a table indexed address) to "predict" what address a branch will go to.
Oh, I didn't know that. Interesting... It does this even for a "branch to address given in a register" instruction (which, I guess, is what it would be on a RISC processor)?
I had been under the mistaken impression that "branch prediction" meant "predicting whether or not a conditional branch will be taken" not "predicting which is the next instruction to execute after ANY branch so we can keep the pipeline filled".
Thanks for explaining that, Jan.
Then I read Tim Olson's further details:
>There are at least 2 kinds of Branch Target Caches:
>Branch Target Address Cache [BTAC] (P6, PowerPC 604e)
> this cache is indexed by the address of the branch instruction, and
> returns the address of the predicted target
>Branch Target Instruction Cache [BTIC] (AMD 29K, PowerPC 750)
> this cache is indexed by the address of the branch target, and
> returns the first few instructions at that target
So (let me summarize my understanding here), on current Intel chips (P6) (which uses BTAC) this branch misprediction (for a byte-code dispatch loop that used the bytecode to index into a 256 element table of addresses) WOULD have the bad branch misprediction behavior Jan takes about.
But current PPC's (which use BTIC) wouldn't have this problem in this kind of byte-code dispatch loop. In fact since the first few instructions to handle each bytecode would be cached, they should have really good pipelining behavior for this kind of byte-code dispatch loop.
FYI: The BrouHaHa interpreter which Eliot Miranda wrote used (if I remember correctly) what the referenced document (http://www.complang.tuwien.ac.at/forth/threaded-code.html) would have us call "direct token threaded code". On current PPC chips this kind of byte-code interpreter should also perform well because of the PPC's BTIC.
Thanks for the additional details Tim!
Carl
P.S. Gee, following this thread is like taking a course in computer engineering!
More information about the Squeak-dev
mailing list
|