Optimizing Squeak

Greg & Cindy Gritton gritton at ibm.net
Tue Feb 23 04:59:18 UTC 1999


At 04:29 PM 2/22/99 +0100, you wrote:
>Greg & Cindy Gritton wrote:
>> 
>>                       Speeding up Squeak
>
>[...]
>inline caches:
>
>> 
>> The resulting code ends up as:
>> 
>>     and  oop, 1                          ;If small integer load
IntegerClass
>>     jz   notSmallInt
>>     set  #IntegerClass -> r1
>>     jump afterIntTest
>> notSmallInt:
>>     load [oop + #classOffset] -> r1      ;Load actual class
>> afterIntTest:
>>     load [method + #someOffset] -> r2    ;Load predicted class
>>     load [method + #someOffset+4] -> r3  ;Load stored method to call from
>> "inline" cache
>>     call r3 + #codeOffset                ;Call actual metod
>> ...
>
>By doing so, you undo the major advantage of IC. It is not the cost of
>cache lookup that inline caches try to minimize, but the avoidal of
>indirect jumps. Driesen et al argument that modern processors are bad at
>indirect jump prediction with enormous costs when failing.
>That's why a direct jump should appear in the instruction code.
>
>  jump o->class->vtable[methodIndex]
>
>is no better than
> 
>  jump currentMethod->predictedMethods[indexForThisCallSite]
>
>in this respect, right ?
>
>
>	Matthias
>
>
>

Right.

It is a tradeoff, trading off self-modifying code on inline cache
miss for an indirect branch on a inline cache hit.  Which is
better depends on the specific processor implementation.

The paper by Driesen et al assumed that a processor could not
predict indirect branches, making indirect branches expensive.
Modern processor generally contain a Branch Target Cache (P6, PowerPC 750)
or next instruction cache line pointer (K7).  These can be used
to predict indirect branches as well as direct branches.

On the other hand, self-modifying code can be expensive in real
processors.  Some RISC processors used to have to flush the I-cache 
when any code is modified.  I am not sure whether this is true any more.
Even if an I-cache flush is not required, flushing the pipeline
likey is.  As pipelines get longer this can start to hurt.

The best bet would be to try out both code sequences in
a variety of processors and see which runs faster on average.

Greg Gritton
gritton at ibm.net





More information about the Squeak-dev mailing list