[squeak-dev] Rearranging variables in interpreter struct (lil speedup)

Fri Feb 29 05:08:43 UTC 2008

I wrote the algorithm that is there to sort the variables based on  
usage since I couldn't come up
with a nicer metric, because as you thought cache line placement could  
be important. 8 years back (or so)
it wasn't important, now it is, we noticed a 2% gain in a VM build by  
adding a single instance variable, which
changed placement of other variables on a cache line.

Hand sorting couldn't hurt, didn't want to do that at the time.

Isn't everything Intel now anyway? Alas...

The other magic that is lurking btw is if you shared a global between  
different routines and this variable(s) become
part of a sole routine after the merging of methods and inlining is  
done then the variable is moved out of the foo structure
and made a local.  There is a method where you can deny that behavior  
for a variable/routine if I remember.

The *most* important place this was effective in was the GC logic.   
The GC logic is split between various smalltalk methods, which
all get inlined together. So on the powerpc those globals sharing  
state between the 4-5 routines would all become local register variables
made quite a different in performance.

Andreas made a change a few years back to change scoping of local  
variables to the block they are used in, versus defining the locals
at the top level of the routine, that help register allocation.

Lastly I've a change set, well somewhere, that would extrude slang and  
say void foobar{} if in fact method foobar didn't actually return  
anything.
This was never accepted because we found certain methods in the  
VMMachine defs would return values, but in fact actually didn't return  
values.
however one can never tell what the compiler will do given int  
foobar{} versus void foobar{}

On Feb 28, 2008, at 8:34 PM, Igor Stasenko wrote:

> I just wondering, what if i place variables in interpreter struct in
> specific order,
> not in order, which code generator produces.
>
> Since speed impact expected to be very small (if any), i used
> following code to measure difference.
>
> [ 5 timesRepeat: [ 1 tinyBenchmarks] ] timeToRun
>
> This is HydraVM without attempts to arrange interpreters ivars:
>
> 28921
> 28863
> 28863
>
> This build shows a small slowdown (i placed all big-sized arrays at
> tail, and #stackTop #successFlag at head)
>
> 28938
> 28940
>
> With this build i placed methodCache ivar to come first.
>
> 28279
> 28264
>
> so, difference is small but noticeable. Placing methodCache first
> gives roughly 1-2% speedup.
>
> I don't really sure if this worth experimenting at all. And i'm
> lacking of knowledge of different CPU/compiler details to predict that
> these changes will take effect on different CPUs (mine is AMD Athlon
> series).
>
> If you having any ideas, concerning given changes, or any other
> optimizations which can probably improve speed, please feel free to
> uncover details/guidelines.
>
> -- 
> Best regards,
> Igor Stasenko AKA sig.
>

--
= 
= 
= 
========================================================================
John M. McIntosh <johnmci at smalltalkconsulting.com>
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
= 
= 
= 
========================================================================