Optimizing Squeak
Tim Olson
tim at jumpnet.com
Tue Feb 23 05:35:19 UTC 1999
Greg wrote:
>Modern processor generally contain a Branch Target Cache (P6, PowerPC 750)
>or next instruction cache line pointer (K7). These can be used
>to predict indirect branches as well as direct branches.
There are at least 2 kinds of Branch Target Caches:
Branch Target Address Cache [BTAC] (P6, PowerPC 604e)
this cache is indexed by the address of the branch instruction, and
returns the address of the predicted target
Branch Target Instruction Cache [BTIC] (AMD 29K, PowerPC 750)
this cache is indexed by the address of the branch target, and
returns the first few instructions at that target
BTAC structures can be used to predict indirect as well as conditional
branches. BTICs are used to solve a slightly different problem, and are
usually indexed by the branch target address, meaning that it has to be
computed before the branch prediction hardware can start fetching down
the predicted path.
>On the other hand, self-modifying code can be expensive in real
>processors. Some RISC processors used to have to flush the I-cache
>when any code is modified. I am not sure whether this is true any more.
PowerPC processors use the ICBI (Instruction Cache Block Invalidate)
instruction to invalidate a potentially stale block from the icache, and
the DCBST instruction to force a modified data cache block to memory so
that the instruction prefetcher can see the new instructions. This
sequence isn't cheap, but it's much less expensive than flushing the
entire icache.
>The best bet would be to try out both code sequences in
>a variety of processors and see which runs faster on average.
Yes, empirical testing is probably required, here. The big unknown is
the percentage of high-utilization method sends that always send to the
same class, vs. polymorphic dispatch. Inline caches (self-modifying
code) will be a win for the former, while indirect lookup will be better
for the later.
-- tim
More information about the Squeak-dev
mailing list
|