[Vm-dev] 3 Bugs in LargeInteger primitives
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Sat Sep 1 21:55:50 UTC 2012
Yes benchmarking is an art, this was a gross aproximation
2012/9/1 Levente Uzonyi <leves at elte.hu>:
>
> On Sat, 1 Sep 2012, Nicolas Cellier wrote:
>
>>
>> Hi Stefan,
>>
>> I just tried on a stack VM (MacOSX core 2...), and I get different
>> measurements, at most 3% penalty, not 20%
>> If I rewrite bytecodePrimMultiply like this :
>>
>> bytecodePrimMultiply
>> | rcvr arg result |
>> <var: #result type: 'sqLong'>
>> rcvr := self internalStackValue: 1.
>> arg := self internalStackValue: 0.
>> (self areIntegers: rcvr and: arg)
>> ifTrue: [rcvr := objectMemory integerValueOf: rcvr.
>> arg := objectMemory integerValueOf: arg.
>> result := rcvr.
>> result := result * arg.
>> (result >= 16r-40000000 and: [result <=
>> 16r3FFFFFFF]) ifTrue:
>> [self internalPop: 2 thenPush:
>> (objectMemory integerObjectOf: result).
>> ^self fetchNextBytecode
>> "success"]]
>> ifFalse: [...
>>
>> Then I get mini bench timing:
>>
>> ORIGINAL:
>> [33*35] bench
>> '8,470,000 per second.'
>> '8,670,000 per second.'
>>
>> MODIFIED:
>> '8,410,000 per second.'
>> '8,370,000 per second.'
>
>
> IMHO #bench has too high overhead for accurately measuring such simple
> operations. With a CogVM I get this:
>
> [ 33 * 35 ] bench '40,000,000 per second.'.
> [] bench '47,300,000 per second.'
>
>
> Levente
>
>
>>
>> Though, I didn't used SSE or any 64 bits friendly instructions:
>>
>> line 6501
>> result = result * arg;
>> .loc 1 6501 0
>> movl -1404(%ebp), %eax
>> movl %eax, %edx
>> sarl $31, %edx
>> movl -2068(%ebp), %ecx
>> imull %eax, %ecx
>> movl -2072(%ebp), %ebx
>> imull %edx, %ebx
>> addl %ebx, %ecx
>> mull -2072(%ebp)
>> addl %edx, %ecx
>> movl %ecx, %edx
>> movl %eax, -2072(%ebp)
>> movl %edx, -2068(%ebp)
>> movl %eax, -2072(%ebp)
>> movl %edx, -2068(%ebp)
>>
>> So it certainly is un-optimal, but I'm sure we wouldn't notice any
>> difference on a macro benchmark.
>>
>> Also, we could let the primitive use create a #signed64BitIntegerFor:
>> result instead of falling back to normal send if we really want to
>> favour performance over clean separation (after all, the primitive
>> already know about Float, why not about LargeInteger...)
>>
>> Nicolas
>>
>> 2012/8/30 Stefan Marr <smalltalk at stefan-marr.de>:
>>>
>>>
>>> Hi:
>>>
>>> On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
>>>
>>>>
>>>> See also http://code.google.com/p/cog/issues/detail?id=92 where I
>>>> attached a fix for large int
>>>> It's untested yet and to review carefully !
>>>>
>>>> As Stefan told, there is UB-reliance in SmallInteger primitives too,
>>>> but I did not fix them.
>>>> We should simply compute result as signed 64 bits as proposed by
>>>> Stefan (except bitShift)
>>>
>>>
>>> This might be the simplest solution, but at least on the RoarVM I
>>> measured a significant performance impact on tight integer loops.
>>> It's 20% according to my measurements.
>>>
>>> Might be something necessary to be considered.
>>>
>>> Best regards
>>> Stefan
>>>
>>>
>>> --
>>> Stefan Marr
>>> Software Languages Lab
>>> Vrije Universiteit Brussel
>>> Pleinlaan 2 / B-1050 Brussels / Belgium
>>> http://soft.vub.ac.be/~smarr
>>> Phone: +32 2 629 2974
>>> Fax: +32 2 629 3525
>>>
>>
>
More information about the Vm-dev
mailing list