[squeak-dev] RoarVM: The Manycore SqueakVM

Bert Freudenberg bert at freudenbergs.de
Thu Nov 4 18:07:45 UTC 2010


On 03.11.2010, at 14:13, Stefan Marr wrote:

> A small teaser:
>  1 core   66286897 bytecodes/sec;  2910474 sends/sec
>  8 cores 470588235 bytecodes/sec; 19825677 sends/sec

I tried your precompiled OS X VM and the Sly3 image.

1 core:  93,910,491 bytecodes/sec; 4,056,440 sends/sec
2 cores: 91,559,370 bytecodes/sec; 4,007,927 sends/sec
3 cores: can't start
4 cores: 90,844,570 bytecodes/sec; 3,935,516 sends/sec
5 cores: can't start
6 cores: can't start
7 cores: can't start
8 cores: 89,698,668 bytecodes/sec; 3,910,787 sends/sec

So it looks like you have to use a power-of-two cores?

And the benchmark invocation should be different if you want to actually use multiple cores. What's the magic incantation?

I tried something myself:

n := 16.
q := SharedQueue new.
time := Time millisecondsToRun:
	[n timesRepeat: [[q nextPut: [30 benchFib] timeToRun] fork].
	n timesRepeat: [Transcript space; show: q next]].
Transcript space; show: time; cr

1 core:  664 664 665 666 667 662 664 664 668 665 667 665 666 669 666 10700
2 cores: 675 674 672 669 677 669 669 672 678 670 668 669 674 668 668 5425
4 cores: 721 726 729 740 713 728 740 734 731 737 721 737 734 756 788 749 3030
8 cores: 786 807 837 847 865 872 916 840 800 873 792 880 846 865 829 1820

Now that scales pretty nicely :) The overhead is about 25% at 8 cores, 12% for 4 cores.

For our regular interpreter (*) I get:
1 core: 162 159 157 158 158 160 159 159 159 159 159 158 160 158 159 2585

So RoarVM is about 4 times slower in sends, even more so for bytecodes. It needs 8 cores to be faster the regular interpreter on a single core. To the good news is that it can beat the old interpreter :)  But why is it so much slower than the normal interpreter?

Btw, user interrupt didn't work on the Mac. 

And in the Squeak-4.1 image, when running on 2 or more cores Morphic gets incredibly sluggish, pretty much unusably so.

- Bert -

(*) For comparison, a regular interpreter (not Cog) on this machine gets
    789,514,263 bytecodes/sec; 17,199,374 sends/sec
and Cog does
    880,481,513 bytecodes/sec; 70,113,306 sends/sec




More information about the Squeak-dev mailing list