[Vm-dev] 3 Bugs in LargeInteger primitives

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Sep 1 21:55:50 UTC 2012


Yes benchmarking is an art, this was a gross aproximation

2012/9/1 Levente Uzonyi <leves at elte.hu>:
>
> On Sat, 1 Sep 2012, Nicolas Cellier wrote:
>
>>
>> Hi Stefan,
>>
>> I just tried on a stack VM (MacOSX core 2...), and I get different
>> measurements, at most 3% penalty, not 20%
>> If I rewrite bytecodePrimMultiply like this :
>>
>> bytecodePrimMultiply
>>         | rcvr arg result |
>>         <var: #result type: 'sqLong'>
>>         rcvr := self internalStackValue: 1.
>>         arg := self internalStackValue: 0.
>>         (self areIntegers: rcvr and: arg)
>>                 ifTrue: [rcvr := objectMemory integerValueOf: rcvr.
>>                                 arg := objectMemory integerValueOf: arg.
>>                                 result := rcvr.
>>                                 result := result * arg.
>>                                 (result >= 16r-40000000 and: [result <=
>> 16r3FFFFFFF]) ifTrue:
>>                                         [self internalPop: 2 thenPush:
>> (objectMemory integerObjectOf: result).
>>                                          ^self fetchNextBytecode
>> "success"]]
>>                 ifFalse: [...
>>
>> Then I get mini bench timing:
>>
>> ORIGINAL:
>> [33*35] bench
>> '8,470,000 per second.'
>> '8,670,000 per second.'
>>
>> MODIFIED:
>> '8,410,000 per second.'
>> '8,370,000 per second.'
>
>
> IMHO #bench has too high overhead for accurately measuring such simple
> operations. With a CogVM I get this:
>
> [ 33 * 35 ] bench '40,000,000 per second.'.
> [] bench '47,300,000 per second.'
>
>
> Levente
>
>
>>
>> Though, I didn't used SSE or any 64 bits friendly instructions:
>>
>> line 6501
>>                                         result = result * arg;
>> .loc 1 6501 0
>>         movl    -1404(%ebp), %eax
>>         movl    %eax, %edx
>>         sarl    $31, %edx
>>         movl    -2068(%ebp), %ecx
>>         imull   %eax, %ecx
>>         movl    -2072(%ebp), %ebx
>>         imull   %edx, %ebx
>>         addl    %ebx, %ecx
>>         mull    -2072(%ebp)
>>         addl    %edx, %ecx
>>         movl    %ecx, %edx
>>         movl    %eax, -2072(%ebp)
>>         movl    %edx, -2068(%ebp)
>>         movl    %eax, -2072(%ebp)
>>         movl    %edx, -2068(%ebp)
>>
>> So it certainly is un-optimal, but I'm sure we wouldn't notice any
>> difference on a macro benchmark.
>>
>> Also, we could let the primitive use create a #signed64BitIntegerFor:
>> result instead of falling back to normal send if we really want to
>> favour performance over clean separation (after all, the primitive
>> already know about Float, why not about LargeInteger...)
>>
>> Nicolas
>>
>> 2012/8/30 Stefan Marr <smalltalk at stefan-marr.de>:
>>>
>>>
>>> Hi:
>>>
>>> On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
>>>
>>>>
>>>> See also http://code.google.com/p/cog/issues/detail?id=92 where I
>>>> attached a fix for large int
>>>> It's untested yet and to review carefully !
>>>>
>>>> As Stefan told, there is UB-reliance in SmallInteger primitives too,
>>>> but I did not fix them.
>>>> We should simply compute result as signed 64 bits as proposed by
>>>> Stefan (except bitShift)
>>>
>>>
>>> This might be the simplest solution, but at least on the RoarVM I
>>> measured a significant performance impact on tight integer loops.
>>> It's 20% according to my measurements.
>>>
>>> Might be something necessary to be considered.
>>>
>>> Best regards
>>> Stefan
>>>
>>>
>>> --
>>> Stefan Marr
>>> Software Languages Lab
>>> Vrije Universiteit Brussel
>>> Pleinlaan 2 / B-1050 Brussels / Belgium
>>> http://soft.vub.ac.be/~smarr
>>> Phone: +32 2 629 2974
>>> Fax:   +32 2 629 3525
>>>
>>
>


More information about the Vm-dev mailing list