[Vm-dev] Performance of primitiveFailFor: and use of primFailCode

Tue May 24 12:46:49 UTC 2011

On Tue, May 24, 2011 at 02:16:05PM +0200, Igor Stasenko wrote:
> 
> On 24 May 2011 14:00, David T. Lewis <lewis at mail.msen.com> wrote:
> >
> > On Tue, May 24, 2011 at 10:46:02AM +0200, Igor Stasenko wrote:
> >>
> >> >
> >> > No mistake, the performance problem was real.
> >> >
> >> > Good news - I found the cause. Better news - this may be good for a
> >> > performance boost on StackVM and possibly Cog also.
> >> >
> >> > The performance hit was due almost entirely to InterpreterPrimitives>>failed,
> >> > and perhaps a little bit to #successful and #success: also.
> >> >
> >> > This issue with #failed is due to "^primFailCode ~= 0" which, for purposes
> >> > of C translation, can be recoded as "^primFailCode" with an override in
> >> > the simulator as "^primFailCode ~= 0". This produces a significant speed
> >> > improvement, at least as fast as for the original interpreter implementation
> >> > using successFlag.
> >> >
> >> > I expect that the same change applied to StackInterpreter may give a similar
> >> > 10% improvement (though I have not tried it). I don't know what to expect
> >> > with Cog, but it may give a boost there as well.
> >> >
> >> > Changes attached, also included in VMMaker-dtl.237 on SqueakSource.
> >> >
> >> > Dave
> >> >
> >> >
> >> >
> >>
> >> added to http://code.google.com/p/cog/issues/detail?id=45
> >
> > Thanks Igor.
> >
> >>
> >> it is strange that such small detail ??could make a lot of difference in speed.
> >
> > Yes, I was very surprised to see it also. It will be interesting to see
> > if it has a similar effect for StackInterpreter. I probably will not have
> > time to check this for a while, so if you try it please let us know
> > what you find.
> >
> 
> What you using to measure difference in speed?
> 

I just use tinyBenchmarks as a smoke test to make sure that
changes in the slang do not affect performance. So I am looking
at different variants of the code, running each one five times
to get an average. Obviously this does not reflect real performance,
but it is useful for spotting problems. Examples on my system:

  Standard interpreter VM with successFlag
  0 tinyBenchmarks '444444444 bytecodes/sec; 14317245 sends/sec'
  0 tinyBenchmarks '435374149 bytecodes/sec; 14012854 sends/sec'
  0 tinyBenchmarks '437606837 bytecodes/sec; 15277259 sends/sec'
  0 tinyBenchmarks '437981180 bytecodes/sec; 15252007 sends/sec'
  0 tinyBenchmarks '443674176 bytecodes/sec; 14406658 sends/sec'

  Interpreter VM with primFailCode
  0 tinyBenchmarks '398133748 bytecodes/sec; 14895019 sends/sec'
  0 tinyBenchmarks '393241167 bytecodes/sec; 14228935 sends/sec'
  0 tinyBenchmarks '396284829 bytecodes/sec; 14250910 sends/sec'
  0 tinyBenchmarks '396591789 bytecodes/sec; 14907050 sends/sec'
  0 tinyBenchmarks '401883830 bytecodes/sec; 14520007 sends/sec'

  Interpreter VM with primFailCode after optimizing #failed, #success:, and #successful
  0 tinyBenchmarks '447161572 bytecodes/sec; 14979650 sends/sec'
  0 tinyBenchmarks '442523768 bytecodes/sec; 14955371 sends/sec'
  0 tinyBenchmarks '447161572 bytecodes/sec; 14991818 sends/sec'
  0 tinyBenchmarks '443290043 bytecodes/sec; 14350644 sends/sec'
  0 tinyBenchmarks '445604873 bytecodes/sec; 15114601 sends/sec'

Similar tests showed that the differences were almost entirely
associated with #failed.

I have to say that I am still uncomfortable about this, because
I cannot really explain why the change has such a large effect.
The #failed method is used only in a few places in the interpreter
itself. So if you are able to independently verify (or refute)
any of these results, that would be great.

Thanks,
Dave