[squeak-dev] Re: jitter (was: The Old Man)

Thu Apr 3 19:51:28 UTC 2008

Andreas Raab writes:

 > One of my problems with Exupery is that I've only seen claims about byte 
 > code speed and if you know where the time goes in a real-life 
 > environment then you know it ain't bytecodes. In other words, it seems 
 > to me that Exupery is optimizing the least significant portion of the 
 > VM. I'd be rather more impressed if it did double the send speed.

Then be impressed. Exupery has had double Squeak's send performance
since March 2005.

 http://people.squeakfoundation.org/person/willembryce/diary.html?start=23

That's done by using polymorphic inline caches which are also used to
drive dynamic primitive inlining. It is true that further send
performance gains are not planned before 1.0. Doubling send
performance should be enough to provide a practical performance
improvement. It's better to solve all the problems standing in the way
of a practical performance improvement before starting work on full
method inlining which should provide serious send performance.

Here's the current benchmarks:
  Executing Code
  ==============
  arithmaticLoopBenchmark 1397 compiled 138 ratio: 10.122
  bytecodeBenchmark 2183 compiled 435 ratio: 5.017
  sendBenchmark 1657 compiled 741 ratio: 2.236
  doLoopsBenchmark 1100 compiled 813 ratio: 1.353
  pointCreation 988 compiled 968 ratio: 1.021 
  largeExplorers 729 compiled 780 ratio: 0.935
  compilerBenchmark 529 compiled 480 ratio: 1.102
  Cumulative Time 1113.161 compiled 538.355 ratio 2.068

  Compile Time
  ============
  ExuperyBenchmarks>>arithmeticLoop 199ms
  SmallInteger>>benchmark 791ms
  InstructionStream>>interpretExtension:in:for: 14266ms
  Average 1309.515

The bottom two executing code benchmarks are macro benchmarks. They
compile a few methods based on a profile run then re-run the
benchmark.

There's several primitives that are inlined into the main interpret()
loop in the interpreter but require full worst case dispatching in
Exupery. They'll need to be implemented to prevent slow downs to
code the benefits. Also there are few limitations that can cause
Exupery to produce unperformant code in some situations. There
are also bugs, the last release would run for about an hour of
development before crashing. These are the issues that are currently
being worked on.

Here's the benchmarks from the 0.13 release:
  arithmaticLoopBenchmark 1397 compiled 138 ratio: 10.122
  bytecodeBenchmark 2183 compiled 435 ratio: 5.017
  sendBenchmark 1657 compiled 741 ratio: 2.236
  doLoopsBenchmark 1100 compiled 813 ratio: 1.353
  pointCreation 988 compiled 968 ratio: 1.021
  largeExplorers 729 compiled 780 ratio: 0.935
  compilerBenchmark 529 compiled 480 ratio: 1.102
  Cumulative Time 1113.161 compiled 538.355 ratio 2.068

  ExuperyBenchmarks>>arithmeticLoop 199ms
  SmallInteger>>benchmark 791ms
  InstructionStream>>interpretExtension:in:for: 14266ms
  Average 1309.515

The major gains are in the compileBenchmark macro benchmark and in
compilation time. Both due to work on the register allocator.

Exupery from the beginning has been an attempt to combine serious
optimisation with full method inlining similar to Self while having the
entire compiler written in Smalltalk. It's an ambitious goal that's
best tackled in smaller steps.

Bryce