[Vm-dev] Some Performance Numbers: Java vs. CogVM vs. SOM

Mon Apr 6 20:24:14 UTC 2015

I've checked the code, and ran some benchmarks in Squeak. When I tried to 
load the code, I got complaints from the system, because the name of class 
variables begin with lowercase letters.
ScriptCollector is also missing from Squeak, though it's easy to work this
around.
There are still plenty of #% sends in the code, which I had to rewrite to 
#\\.
The PageRank benchmark is so slow that I stopped running it after about 30 
minutes. The profiler shows, that it spends over 95% of the time in 
SomJenkinsRandom class >> #random. I've got a faster (~6x) version of that 
PRNG, but it's still way to slow. One can consider this as a weakness of 
the system, but it's also a weak point of the benchmark, that relies so 
heavily on a fast PRNG implementation. The code is also pretty bad, 
because it uses only a few bits out of the generated 32, and it has to 
fight with the signed result. Whoever came up with using that "PRNG" 
hasn't really put much thought in it...
I tried it with another PRNG which is another 6x faster (so the overall 
speed is 36x of the original version), but that's still way too slow. 
Squeak is rather slow here. An optimized PRNG written in C generates about 
3 magnitudes more random bits than an optimized PRNG in Squeak at the same 
time.

About porting:
I don't know what your original goal was, but I don't see why you would 
keep 0 based indexing in the code. Smalltalk uses 1-based indexing, and 
this definitely has a negative impact on the Smalltalk results. If you 
were to port code from Smalltalk to Java, would you keep the 1-based 
indexes?

Another thing is about types:
The [1 - SomPageRank DFactor / n] expression is calculated in an n*n 
loop during the PageRank benchmark, where n is a constant of the 
benchmark, and SomPageRank DFactor is also a constant - 0.85. Let's see 
how this adds to the runtime:

n := 100.
[ 1 - SomPageRank DFactor / n ] bench. '5,110,000 per second. 196 nanoseconds per run.'.
[ 1.0 - SomPageRank DFactor / n ] bench. '23,500,000 per second. 42.5 nanoseconds per run.'.
nFloat := n asFloat.
[ 1.0 - 0.85 / nFloat ] bench. '26,000,000 per second. 38.5 nanoseconds per run.'.
[ 0.15 / nFloat ] bench. '41,400,000 per second. 24.2 nanoseconds per run.'.
[ 0.0015 ] bench. '118,000,000 per second. 8.46 nanoseconds per run.'.
[] bench. '125,000,000 per second. 8.01 nanoseconds per run.'

So the code is calculating the same constant over and over again. Due to 
type conversions this is about 25x slower than using a precalculated 
constant (and ~5x slower than the code with proper types).
Of course an adaptive optimizer could optimize this, but the same applies 
to any programmer who care about performance.

The corresponding java code is:

private static double D_FACTOR = 0.85; // damping factor

and

((1 - D_FACTOR)/n)

If I were to port this, and I would want to stick to the implementation, 
then D_FACTOR would be a class variable, and the code would read as: [1.0 
- DFactor / n]. But knowing that the constant is not used anywhere else, I 
see no problem with precalculating the value.

Levente

On Mon, 6 Apr 2015, Stefan Marr wrote:

>
> Hi:
>
> Not sure whether I’ll get to write a little more detailed report, but I wanted to briefly share a few pieces of data on the performance of the CogVM and StackVM. (Spur benchmarks are still running).
>
> I set up a collection of benchmarks to be able to compare the performance of Java, my SOM implementations, and Cog/StackVM [1].
>
> The set contains the following benchmarks:
> - DeltaBlue
> - Richards
> - GraphSearch (search in a graph data structure)
> - Json (a minimal JSON parser benchmark)
> - PageRank (a page rank algorithm implementation)
> -- NBody, Mandelbrot, Bounce, BubbleSort, QuickSort, Fannkuch
> -- Permute, Queens, Sieve, Storage, Towers
>
> The Java implementations are here [2] and the SOM implementations here [3].
>
> Naturally, the comparison is not ideal between languages. Java isn’t Smalltalk, and neither is Pharo/Squeak exactly the same as SOM. However, the benchmarks are ported to resemble as closely as possible the implementations in the other languages, with an emphasize on modern/Smalltalk-ish style where possible. For instance, the DeltaBlue implementation in Java is updated to use Java 8 lambdas and other modern APIs.
>
> The Results
> ———————————
>
> The most interesting one is peak performance, after warmup, with 100 iterations of each benchmark. The results are normalized to Java. This means, we see the slowdown factors here (less is better). I also report the minimal and maximal values to show the range over all benchmarks.
>
>              geomean   min  max
> Java 8          1.0      1.0    1.0
> latest PharoVM 12.9      2.5  182.4 (not sure which exact version of the CogVM that is)
> TruffleSOM      2.3      1.0    4.9
> RTruffleSOM     3.0      1.5   11.5
>
> TruffleSOM is SOM implemented as a self-optimizing interpreter on top of Truffle, a Java framework.
> RTruffleSOM is SOM as a self-optimizing interpreter on top of RPython’s meta-tracing framework (think PyPy).
>
> So, what we see here is that the CogVM is on average 13x slower than Java 8. I think that’s not bad at all, considering that it is not doing any adaptive compilation yet. The slowest benchmark is PageRank. The fasted one is DeltaBlue.
> Compared to the CogVM, my SOM implementations are doing a little better :)
>
>
> Another interesting data point is the pure interpreter performance:
>
>              geomean    min   max
> Java 8 interp   1.0      1.0    1.0
> PharoVM Stack   1.6      0.5   15.3 (not sure which exact version of the StackVM that is)
> TruffleSOM      6.3      1.9   15.7
> RTruffleSOM     5.6      1.6   15.7
>
> What we see here is that the StackVM is actually sometimes faster than the Java interpreter.
> While the PageRank benchmark is still the slowest, for the following benchmarks, the StackVM is faster than Java’s bytecode interpreter: DeltaBlue, Json, NBody, Permute, Richards, Storage, Towers.
>
>
> Well, that’s it for the moment.
> I hope that Clement and Eliot find those benchmarks useful, especially for the work on Sista.
>
>
> And, I wonder whether that makes the SOMs the fasted open source Smalltalk implementations? ;)
>
> Best regards
> Stefan
>
>
> [1] http://smalltalkhub.com/#!/~StefanMarr/SMark/versions/SOM-Benchmarks-StefanMarr.4
> [2] https://github.com/smarr/Classic-Benchmarks/tree/master/benchmarks/som
> [3] https://github.com/SOM-st/SOM/tree/master/Examples/Benchmarks
>
> -- 
> Stefan Marr
> INRIA Lille - Nord Europe
> http://stefan-marr.de/research/
>
>
>
>