Optimizing Squeak

Tue Feb 23 05:35:19 UTC 1999

Greg wrote:

>Modern processor generally contain a Branch Target Cache (P6, PowerPC 750)
>or next instruction cache line pointer (K7).  These can be used
>to predict indirect branches as well as direct branches.

There are at least 2 kinds of Branch Target Caches:

Branch Target Address Cache [BTAC] (P6, PowerPC 604e)
     this cache is indexed by the address of the branch instruction, and
     returns the address of the predicted target

Branch Target Instruction Cache [BTIC] (AMD 29K, PowerPC 750)
     this cache is indexed by the address of the branch target, and
     returns the first few instructions at that target

BTAC structures can be used to predict indirect as well as conditional 
branches.  BTICs are used to solve a slightly different problem, and are 
usually indexed by the branch target address, meaning that it has to be 
computed before the branch prediction hardware can start fetching down 
the predicted path.

>On the other hand, self-modifying code can be expensive in real
>processors.  Some RISC processors used to have to flush the I-cache 
>when any code is modified.  I am not sure whether this is true any more.

PowerPC processors use the ICBI (Instruction Cache Block Invalidate) 
instruction to invalidate a potentially stale block from the icache, and 
the DCBST instruction to force a modified data cache block to memory so 
that the instruction prefetcher can see the new instructions.  This 
sequence isn't cheap, but it's much less expensive than flushing the 
entire icache.

>The best bet would be to try out both code sequences in
>a variety of processors and see which runs faster on average.

Yes, empirical testing is probably required, here.  The big unknown is 
the percentage of high-utilization method sends that always send to the 
same class, vs. polymorphic dispatch.  Inline caches (self-modifying 
code) will be a win for the former, while indirect lookup will be better 
for the later.

     -- tim