[Vm-dev] RoarVM: The Manycore SqueakVM

Levente Uzonyi leves at elte.hu
Sat Nov 6 17:19:12 UTC 2010


On Sat, 6 Nov 2010, Igor Stasenko wrote:

>
> On 6 November 2010 17:26, Levente Uzonyi <leves at elte.hu> wrote:
>>
>> On Thu, 4 Nov 2010, Stefan Marr wrote:
>>
>>>
>>> Hi Bert:
>>>
>>> On 04 Nov 2010, at 20:20, Bert Freudenberg wrote:
>>>
>>>>>> So RoarVM is about 4 times slower in sends, even more so for bytecodes.
>>>>>> It needs 8 cores to be faster the regular interpreter on a single core. To
>>>>>> the good news is that it can beat the old interpreter :)  But why is it so
>>>>>> much slower than the normal interpreter?
>>>>>
>>>>> Well, one the one hand, we don't use stuff like the GCC label-as-value
>>>>> extension to have threaded-interpretation, which should help quite a bit.
>>>>> Then, the current implementation based on pthreads is quite a bit slower
>>>>> then our version which uses plain Unix processes.
>>>>> The GC is really not state of the art.
>>>>> And all that adds up rather quickly I suppose...
>>>>
>>>> Hmm, that doesn't sound like it should make it 4x slower ...
>>>
>>> Do you know some numbers for the switch/case-based vs. the threaded
>>> version on the standard VM?
>>> How much do you typically gain by it?
>>
>> If threaded means gnuified (jump table instead of the linear search), then
>> it gives ~2x speedup for the standard SqueakVM.
>>
> to my own experience it gives 30%

Right, it depends on what we take into account. According to this mail: 
http://lists.squeakfoundation.org/pipermail/vm-dev/2010-January/003761.html
tinyBenchmarks gives
'248543689 bytecodes/sec; 8117987 sends/sec' without gnuification and
'411244979 bytecodes/sec; 10560900 sends/sec' with gnuification.

These aren't fully optimized VMs, so the difference may be smaller or 
larger with better optimizations. Anyway in this case in terms of 
bytecodes the difference is 65%, for sends it's 30%. So the general 
speedup is not 2x, but it's not 30% either.

The actual performance difference may be greater depending on the used 
bytecodes (tinyBenchmarks uses only a few) and the compiler's 
capabilities. Btw I wonder why gcc can't compile switch statements like 
this to jump tables by itself without gnuification.


Levente

>
>>
>> Levente
>>
>> snip
>>
>
>
>
> -- 
> Best regards,
> Igor Stasenko AKA sig.
>


More information about the Vm-dev mailing list