[squeak-dev] squeak profiling

Igor Stasenko siguctua at gmail.com
Sat Apr 19 13:45:14 UTC 2008


2008/4/19 Levente Uzonyi <leves at elte.hu>:
> Hi!
>
>
>  Quoting Riccardo Lucchese <riccardo.lucchese at gmail.com>:
>
>
> > Hi,
> >
> > I'm working on profiling squeak/etoys for the olpc project.
> >
> > It seems to me that bytecodes cases in the interpret function
> > (in /platform/unix/src/vm/interp.c) are not ordered in respect to
> > the probability of their execution.
> >
> >
>
>  Since the switch-case statement in C is compiled into a jump table
> (http://en.wikipedia.org/wiki/Jump_table) the order of the branches
> shouldn't affect the speed of execution (you can even start with the default
> branch).
>
>
>
> > Here is a graph for reference (more games in Etoys have
> > the same pattern):
> > http://www.bodhidharma.info/instructions.pdf
> >
> > What I did so far is changing the `256 cases switch' statement to an array
> > of function pointers like (*exec_bytecode_funcs)[bytecode_id](
> > ...shared data... );
> >
>
>  Actually it should be slower, because function calls used to be slower than
> simple jumps or arithmetic instructions.
>
>
>
> > The process was automated with a python script and that needed a little
> bit
> > of code cleaning; after some testing I couldn't trigger any sort of
> > bug in the new code.
> >
> >
>
>  AFAIK the interp.c is generated with VMMaker, so for a new vm, you have to
> run your script again. A pure Squeak solution might be better.
>
>

Moreover, there are already a script which does exacly same what Riccardo did.
See gnuify step, which transforms interp.c to gnu-interp.c and
replaces all cases with jump labels
and converts code which uses jump table.
The reason why it fails, can be that you did some modifications to
code, which makes this script fail to find exact patterns in generated
sources.
But if you following common procedure for generating VM (VMMaker ->
make), it should work ok.

>
> >
> > if we assume a constant time T both for the execution of every
> > [if !(right_case) jump next_case] in the old code and for the
> > function pointer dereference in the new code there is a 10000% gain
> > for the task of calling the right action for a given bytecode
> > (also given the distribution showed in the graph linked above).
> >
>
>  This gain didn't come because of the jump table.
>
>
>
> >
> > In my tests this is a 20/30% win over the interpret routine timings
> > for different games
> > in etoys.
> >
>
>  Are you sure that the performance gain came from these changes? If so, then
> the only reason for the speedup i can think of is that the most common
> bytecodes' code are scattered across memory pages and there are more page
> faults with the switch-case implementation. I wonder if you could check how
> many page faults does the two implementations have.
>
>  Cheers,
>  Levente
>
>
>
>
> >
> > I appreciate any comments on this work.
> > Maybe it could be done better than this ?
> >
> > Thanks,
> > Riccardo
> >
> >
> >
>
>
>
>
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list