On Sat, 1 Sep 2012, Nicolas Cellier wrote:
Hi Stefan,
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :
bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...
Then I get mini bench timing:
ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'
MODIFIED: '8,410,000 per second.' '8,370,000 per second.'
IMHO #bench has too high overhead for accurately measuring such simple operations. With a CogVM I get this:
[ 33 * 35 ] bench '40,000,000 per second.'. [] bench '47,300,000 per second.'
Levente
Though, I didn't used SSE or any 64 bits friendly instructions:
line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)
So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.
Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)
Nicolas
2012/8/30 Stefan Marr smalltalk@stefan-marr.de:
Hi:
On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.
Might be something necessary to be considered.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525