As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM, or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
Hmm.. that's really strange, because as far as i can see, the code there are the same:
(primitive: 3)
Interpreter>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver < integerArgument
InterpreterPrimitives>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver < integerArgument
and #popInteger also same (it fails if value on stack are not smallinteger). So, in Squeak VM comparison primitive also fails, which should lead to evaluating failure code.
One way how to speed this up, is when primitive fails, do not use super-sends, but instead use double dispatch, i.e. instead :
SmallInteger >> < aNumber "Primitive. Compare the receiver with the argument and answer with true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger. Essential. No Lookup. See Object documentation whatIsAPrimitive."
<primitive: 3> ^super < aNumber
do something like:
SmallInteger >> < aNumber "Primitive. Compare the receiver with the argument and answer with true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger. Essential. No Lookup. See Object documentation whatIsAPrimitive."
<primitive: 3> ^ aNumber isGreaterOrEqualToSmallInteger: self
and then effectively, Float>> isGreaterOrEqualToSmallInteger: aSmallInteger
could be implemented as a primitive which takes a smallinteger as argument and does comparison without need to convert integer to Float first.
Also, a primitives for floats accepting a smallintegers as arguments, so by rewriting:
SmallInteger >> < aNumber "Primitive. Compare the receiver with the argument and answer with true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger. Essential. No Lookup. See Object documentation whatIsAPrimitive."
<primitive: 3> ^ aNumber >= self
[1 to: 2000000 do: [:i | i < 2354.234. ] timeToRun
before:
8223
after: 20
so... 421 times faster :)
On 2 May 2011 12:12, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM, or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
On 02.05.2011 12:48, Igor Stasenko wrote:
Hmm.. that's really strange, because as far as i can see, the code there are the same:
(primitive: 3)
Interpreter>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver< integerArgument
InterpreterPrimitives>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver< integerArgument
and #popInteger also same (it fails if value on stack are not smallinteger). So, in Squeak VM comparison primitive also fails, which should lead to evaluating failure code.
Cog has translation methods for those primitives in cogit.c, see genSmallIntegerComparison.
Also, a primitives for floats accepting a smallintegers as arguments, so by rewriting:
SmallInteger>> < aNumber "Primitive. Compare the receiver with the argument and answer with true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger. Essential. No Lookup. See Object documentation whatIsAPrimitive."
<primitive: 3> ^ aNumber>= self
[1 to: 2000000 do: [:i | i< 2354.234. ] timeToRun
before:
8223
after: 20
so... 421 times faster :)
Yes, that's basically what I proposed to do in image, albeit in a different place.
Cheers, Henry
On 2 May 2011 13:06, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
On 02.05.2011 12:48, Igor Stasenko wrote:
Hmm.. that's really strange, because as far as i can see, the code there are the same:
(primitive: 3)
Interpreter>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver< integerArgument
InterpreterPrimitives>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver< integerArgument
and #popInteger also same (it fails if value on stack are not smallinteger). So, in Squeak VM comparison primitive also fails, which should lead to evaluating failure code.
Cog has translation methods for those primitives in cogit.c, see genSmallIntegerComparison.
Also, a primitives for floats accepting a smallintegers as arguments, so by rewriting:
SmallInteger>> < aNumber "Primitive. Compare the receiver with the argument and answer with true if the receiver is less than the argument. Otherwise answer false. Fail if the argument is not a SmallInteger. Essential. No Lookup. See Object documentation whatIsAPrimitive."
<primitive: 3> ^ aNumber>= self
[1 to: 2000000 do: [:i | i< 2354.234. ] timeToRun
before:
8223
after: 20
so... 421 times faster :)
Yes, that's basically what I proposed to do in image, albeit in a different place.
But i really wonder why it falls to normal send.. Consider this:
StackInterpreter>>bytecodePrimLessThan | rcvr arg aBool | rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: ["The C code can avoid detagging since tagged integers are still signed. But this means the simulator must override to do detagging." ^self cCode: [self booleanCheat: rcvr < arg] inSmalltalk: [self booleanCheat: (objectMemory integerValueOf: rcvr) < (objectMemory integerValueOf: arg)]].
self initPrimCall. aBool := self primitiveFloatLess: rcvr thanArg: arg. self successful ifTrue: [^ self booleanCheat: aBool].
messageSelector := self specialSelector: 2. argumentCount := 1. self normalSend
so, for #< send it tries to avoid doing normal send and instead tries to do quick int<int comparison first, and then float/int < float/int second (using #primitiveFloatLess:thanArg:) so it should not fall to normal send.
So , i think the problem is, that #genSmallIntegerComparison: (and friends) generating code for comparing integers only and falls back to normal send when it fails, without attempting to use #primitiveFloatLess:thanArg: .
Cheers, Henry
On 02.05.2011 13:22, Igor Stasenko wrote:
On 2 May 2011 13:06, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
On 02.05.2011 12:48, Igor Stasenko wrote:
Hmm.. that's really strange, because as far as i can see, the code there are the same:
(primitive: 3)
Interpreter>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver< integerArgument
InterpreterPrimitives>>primitiveLessThan | integerReceiver integerArgument | integerArgument := self popInteger. integerReceiver := self popInteger. self checkBooleanResult: integerReceiver< integerArgument
and #popInteger also same (it fails if value on stack are not smallinteger). So, in Squeak VM comparison primitive also fails, which should lead to evaluating failure code.
Cog has translation methods for those primitives in cogit.c, see genSmallIntegerComparison.
So , i think the problem is, that #genSmallIntegerComparison: (and friends) generating code for comparing integers only and falls back to normal send when it fails, without attempting to use #primitiveFloatLess:thanArg: .
static sqInt genSmallIntegerComparison(sqInt jumpOpcode) { AbstractInstruction *jumpFail; AbstractInstruction *jumpTrue;
gMoveRR(Arg0Reg, TempReg); jumpFail = genJumpNotSmallIntegerInScratchReg(TempReg);
Assuming genJumpNotSmallIntegerInScratchReg is an accurate name, I'd say yes. :)
Cheers, Henry
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
the purpose of it is to check if given object is an instance of compact class, and if it is ,then you don't have to fetch class (because you only need to check that compact index is same as compact index for class you are checking for).
So, it says that (objectMemory splObj: ClassFloat) will be optimized away if comparison successfull.. but i think it would be better to use different assertion to be not dependent from compiler optimizations:
is: oop instanceOfClass: classObjectIndex compactClassIndex: compactClassIndex "Answer if oop is an instance of the given class. If the class has a (non-zero) compactClassIndex use that to speed up the check. N.B. Inlining should result in classOop not being accessed if compactClassIndex is non-zero."
| ccIndex | <inline: true> (self isIntegerObject: oop) ifTrue: [^false].
ccIndex := self compactClassIndexOf: oop. compactClassIndex ~= 0 ifTrue: [^compactClassIndex == ccIndex].
^ccIndex = 0 and: [((self classHeader: oop) bitAnd: AllButTypeMask) = (self splObj: classObjectIndex)]
so the difference that you passing a class index as argument, but not class itself. So even if compiler will fail to optimize it, it won't cost you extra load from special objects array. So instead of writing:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
the code can be refactored to use:
self assertClassOf: floatOrInt isSplObj: ClassFloat compactClassIndex: ClassFloatCompactIndex.
(which calls #is:instanceOfClass:compactClassIndex: above)
On 2 May 2011 15:46, Igor Stasenko siguctua@gmail.com wrote:
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
btw, Cog is suspectible to have bugs if during run time you will change a class to be no longer compact or (and then installing a different class to be compact on same compact classes array index as before).
To avoid that, there should be a primitive which should refresh compact indices for most used classes, to avoid bugs.
(The StackInterpreter>>checkAssumedCompactClasses should be run each time when some class become (un)compact).
No. Instead certain compact incomes should be mandated. It is absurd to throw away performance and expend effort supporting complexity for flexibility that is essentially never used and in maintaining a scheme that is only partially effective.
Eliot (phone)
On May 2, 2011, at 7:20 AM, Igor Stasenko siguctua@gmail.com wrote:
On 2 May 2011 15:46, Igor Stasenko siguctua@gmail.com wrote:
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
btw, Cog is suspectible to have bugs if during run time you will change a class to be no longer compact or (and then installing a different class to be compact on same compact classes array index as before).
To avoid that, there should be a primitive which should refresh compact indices for most used classes, to avoid bugs.
(The StackInterpreter>>checkAssumedCompactClasses should be run each time when some class become (un)compact).
-- Best regards, Igor Stasenko AKA sig.
On Mon, May 2, 2011 at 6:11 PM, Eliot Miranda eliot.miranda@gmail.comwrote:
No. Instead certain compact incomes should be mandated. It is absurd to throw away performance and expend effort supporting complexity for flexibility that is essentially never used and in maintaining a scheme that is only partially effective.
+9999
Compact classes didn't change in the last....how many years? In fact, (and I have already sent an email with this) there are only 15 compact classes. This mean that we can even be using 4 bits instead of 5.
Eliot (phone)
On May 2, 2011, at 7:20 AM, Igor Stasenko siguctua@gmail.com wrote:
On 2 May 2011 15:46, Igor Stasenko siguctua@gmail.com wrote:
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
btw, Cog is suspectible to have bugs if during run time you will change a class to be no longer compact or (and then installing a different class to be compact on same compact classes array index as before).
To avoid that, there should be a primitive which should refresh compact indices for most used classes, to avoid bugs.
(The StackInterpreter>>checkAssumedCompactClasses should be run each time when some class become (un)compact).
-- Best regards, Igor Stasenko AKA sig.
On 2 May 2011 18:11, Eliot Miranda eliot.miranda@gmail.com wrote:
No. Instead certain compact incomes should be mandated. It is absurd to throw away performance and expend effort supporting complexity for flexibility that is essentially never used and in maintaining a scheme that is only partially effective.
compact classes are pain in the ... so i would not cry if we get rid of them. but right now i see that potentially i could break things if i start replacing one compact class with another.
Eliot (phone)
On May 2, 2011, at 7:20 AM, Igor Stasenko siguctua@gmail.com wrote:
On 2 May 2011 15:46, Igor Stasenko siguctua@gmail.com wrote:
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
btw, Cog is suspectible to have bugs if during run time you will change a class to be no longer compact or (and then installing a different class to be compact on same compact classes array index as before).
To avoid that, there should be a primitive which should refresh compact indices for most used classes, to avoid bugs.
(The StackInterpreter>>checkAssumedCompactClasses should be run each time when some class become (un)compact).
-- Best regards, Igor Stasenko AKA sig.
On Mon, May 2, 2011 at 12:39 PM, Igor Stasenko siguctua@gmail.com wrote:
On 2 May 2011 18:11, Eliot Miranda eliot.miranda@gmail.com wrote:
No. Instead certain compact incomes should be mandated. It is absurd to
throw away performance and expend effort supporting complexity for flexibility that is essentially never used and in maintaining a scheme that is only partially effective.
compact classes are pain in the ... so i would not cry if we get rid of them. but right now i see that potentially i could break things if i start replacing one compact class with another.
Doctor, doctor, it hurts when I... Doctor: don't do that.
Eliot (phone)
On May 2, 2011, at 7:20 AM, Igor Stasenko siguctua@gmail.com wrote:
On 2 May 2011 15:46, Igor Stasenko siguctua@gmail.com wrote:
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
btw, Cog is suspectible to have bugs if during run time you will change a class to be no longer compact or (and then installing a different class to be compact on same compact classes array index as before).
To avoid that, there should be a primitive which should refresh compact indices for most used classes, to avoid bugs.
(The StackInterpreter>>checkAssumedCompactClasses should be run each time when some class become (un)compact).
-- Best regards, Igor Stasenko AKA sig.
-- Best regards, Igor Stasenko AKA sig.
igor can you add bug entry to cog?
Stef
On May 2, 2011, at 4:20 PM, Igor Stasenko wrote:
On 2 May 2011 15:46, Igor Stasenko siguctua@gmail.com wrote:
Btw,
i don't like this code:
self assertClassOf: floatOrInt is: (objectMemory splObj: ClassFloat) compactClassIndex: ClassFloatCompactIndex.
btw, Cog is suspectible to have bugs if during run time you will change a class to be no longer compact or (and then installing a different class to be compact on same compact classes array index as before).
To avoid that, there should be a primitive which should refresh compact indices for most used classes, to avoid bugs.
(The StackInterpreter>>checkAssumedCompactClasses should be run each time when some class become (un)compact).
-- Best regards, Igor Stasenko AKA sig.
On 2 May 2011 18:19, stephane ducasse stephane.ducasse@gmail.com wrote:
igor can you add bug entry to cog?
sure
http://code.google.com/p/cog/issues/detail?id=40
maybe someday someone could find a time and address it.
Stef
I mistakenly published code for accelerating <SmallInteger> comparisonOp <Float> at http://code.google.com/p/cog/issues/detail?id=40
Apologizes, but it was obvious from reading the thread that the original issue was interesting, but that the issue discovered by Igor would not be fix...
Nicolas
2011/5/3 Igor Stasenko siguctua@gmail.com:
On 2 May 2011 18:19, stephane ducasse stephane.ducasse@gmail.com wrote:
igor can you add bug entry to cog?
sure
http://code.google.com/p/cog/issues/detail?id=40
maybe someday someone could find a time and address it.
Stef
-- Best regards, Igor Stasenko AKA sig.
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com:
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com:
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
Hem, some declaration were missing... and the argument for FP comparison is tricky indeed... Sorry for the noise, but this is a direct live
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com:
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
Apart another declaration bug (FP missing in function name...) there is a killer :
StackToRegisterMappingCogit inherits silently from SimpleStackBasedCogit>>#genSmallIntegerComparison:orDoubleComparison: though it does not share the stack structure. This is fatal...
So an override StackToRegisterMappingCogit>>#genSmallIntegerComparison:orDoubleComparison: is mandatory
Bad, bad inheritance...
Now compiling the VM again...
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hem, some declaration were missing... and the argument for FP comparison is tricky indeed... Sorry for the noise, but this is a direct live
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com:
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
OK, with SimpleCogit: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 69
vs 4.2.5 VM: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 299
StackToRegisterMappingCogit: crash...
I don't get the x86 certification yet :(
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Apart another declaration bug (FP missing in function name...) there is a killer :
StackToRegisterMappingCogit inherits silently from SimpleStackBasedCogit>>#genSmallIntegerComparison:orDoubleComparison: though it does not share the stack structure. This is fatal...
So an override StackToRegisterMappingCogit>>#genSmallIntegerComparison:orDoubleComparison: is mandatory
Bad, bad inheritance...
Now compiling the VM again...
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hem, some declaration were missing... and the argument for FP comparison is tricky indeed... Sorry for the noise, but this is a direct live
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com:
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code.
Cog: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 26594
Trunk: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 229
Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
or is the difference for pure int/int comparition large enough that we should instead change Float>>adaptToInteger: andCompare: to use the corresponding comparitions with Float as receiver, which does work? That's still slower than trunk in my image, but quite a bit better, above test takes about 1s.
Cheers, Henry
OK, easy, I had a reference to ClassReg instead of Arg0Reg (copy/paste error from super). My feeling is that it is very uncomfortable to program this part... Since we mapped our local variables and arguments to an equivalent of global variables (registers Arg0Reg etc), then no compiler will tell us that this register is used before assigned or never used or what... Far, far away from Smalltalk safe lands...
Here is a new try attached:
[1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 71
A bit slower than SimpleCogit ?
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
OK, with SimpleCogit: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 69
vs 4.2.5 VM: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 299
StackToRegisterMappingCogit: crash...
I don't get the x86 certification yet :(
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Apart another declaration bug (FP missing in function name...) there is a killer :
StackToRegisterMappingCogit inherits silently from SimpleStackBasedCogit>>#genSmallIntegerComparison:orDoubleComparison: though it does not share the stack structure. This is fatal...
So an override StackToRegisterMappingCogit>>#genSmallIntegerComparison:orDoubleComparison: is mandatory
Bad, bad inheritance...
Now compiling the VM again...
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hem, some declaration were missing... and the argument for FP comparison is tricky indeed... Sorry for the noise, but this is a direct live
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com:
Hi Henrik,
Eliot (phone)
On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote:
> As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code. > > Cog: > [1 to: 2000000 do: [:i | > i < 2354.234. > i <= 2354.234. > i >= 2354.234. > i > 2354.234.]] timeToRun 26594 > > Trunk: > [1 to: 2000000 do: [:i | > i < 2354.234. > i <= 2354.234. > i >= 2354.234. > i > 2354.234.]] timeToRun 229 > > Should this be changed in the VM,
I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator.
Best Eliot (phone)
> or is the difference for pure int/int comparition large enough that we should instead change > Float>>adaptToInteger: andCompare: > to use the corresponding comparitions with Float as receiver, which does work? > That's still slower than trunk in my image, but quite a bit better, above test takes about 1s. > > Cheers, > Henry
Wow.. that's cool. I had no time to get so deep into Cog JIT internals. And it is really interesting to hear your feedback about it (to Eliot as well, i assume) :)
On 3 May 2011 22:05, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
OK, easy, I had a reference to ClassReg instead of Arg0Reg (copy/paste error from super). My feeling is that it is very uncomfortable to program this part... Since we mapped our local variables and arguments to an equivalent of global variables (registers Arg0Reg etc), then no compiler will tell us that this register is used before assigned or never used or what... Far, far away from Smalltalk safe lands...
Here is a new try attached:
[1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 71
A bit slower than SimpleCogit ?
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
OK, with SimpleCogit: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 69
vs 4.2.5 VM: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 299
StackToRegisterMappingCogit: crash...
I don't get the x86 certification yet :(
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Apart another declaration bug (FP missing in function name...) there is a killer :
StackToRegisterMappingCogit inherits silently from SimpleStackBasedCogit>>#genSmallIntegerComparison:orDoubleComparison: though it does not share the stack structure. This is fatal...
So an override StackToRegisterMappingCogit>>#genSmallIntegerComparison:orDoubleComparison: is mandatory
Bad, bad inheritance...
Now compiling the VM again...
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hem, some declaration were missing... and the argument for FP comparison is tricky indeed... Sorry for the noise, but this is a direct live
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I always refused to learn x86 assembler, but by code imitation I can give my own try in attachment. Note that I did not use the trick to invert FP comparison operands since this does not seem necessary.
Nicolas
2011/5/2 Eliot Miranda eliot.miranda@gmail.com: > > Hi Henrik, > > Eliot (phone) > > On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote: > >> As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code. >> >> Cog: >> [1 to: 2000000 do: [:i | >> i < 2354.234. >> i <= 2354.234. >> i >= 2354.234. >> i > 2354.234.]] timeToRun 26594 >> >> Trunk: >> [1 to: 2000000 do: [:i | >> i < 2354.234. >> i <= 2354.234. >> i >= 2354.234. >> i > 2354.234.]] timeToRun 229 >> >> Should this be changed in the VM, > > I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator. > > > Best > Eliot (phone) > > >> or is the difference for pure int/int comparition large enough that we should instead change >> Float>>adaptToInteger: andCompare: >> to use the corresponding comparitions with Float as receiver, which does work? >> That's still slower than trunk in my image, but quite a bit better, above test takes about 1s. >> >> Cheers, >> Henry >
As always, engineers will give you negative feedback, especially when specialized in system control ;) So I want to rectify the impression I may have given and thank Eliot and Teleplace for this work. Locating and changing the code was possible relatively quickly (1h), even fixing my own bugs (1h30) thanks to immediate VM crash ;). IMO this was possible thanks to Smalltalk IDE, and I would not bet on such efficiency for modifying a C/C++ VM. I consider this exercize as a proof of concept. But maybe I'm biased by too many years of Smalltalk, some other folks should try and fix such little details to confirm.
Nicolas
2011/5/4 Igor Stasenko siguctua@gmail.com:
Wow.. that's cool. I had no time to get so deep into Cog JIT internals. And it is really interesting to hear your feedback about it (to Eliot as well, i assume) :)
On 3 May 2011 22:05, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
OK, easy, I had a reference to ClassReg instead of Arg0Reg (copy/paste error from super). My feeling is that it is very uncomfortable to program this part... Since we mapped our local variables and arguments to an equivalent of global variables (registers Arg0Reg etc), then no compiler will tell us that this register is used before assigned or never used or what... Far, far away from Smalltalk safe lands...
Here is a new try attached:
[1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 71
A bit slower than SimpleCogit ?
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
OK, with SimpleCogit: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 69
vs 4.2.5 VM: [1 to: 2000000 do: [:i | i < 2354.234. i <= 2354.234. i >= 2354.234. i > 2354.234.]] timeToRun 299
StackToRegisterMappingCogit: crash...
I don't get the x86 certification yet :(
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Apart another declaration bug (FP missing in function name...) there is a killer :
StackToRegisterMappingCogit inherits silently from SimpleStackBasedCogit>>#genSmallIntegerComparison:orDoubleComparison: though it does not share the stack structure. This is fatal...
So an override StackToRegisterMappingCogit>>#genSmallIntegerComparison:orDoubleComparison: is mandatory
Bad, bad inheritance...
Now compiling the VM again...
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Hem, some declaration were missing... and the argument for FP comparison is tricky indeed... Sorry for the noise, but this is a direct live
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
And if genSmallIntegerComparison:orDoubleComparison: is correct, then usage shall be quite straight forward
Nicolas
2011/5/3 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com: > I always refused to learn x86 assembler, but by code imitation I can > give my own try in attachment. > Note that I did not use the trick to invert FP comparison operands > since this does not seem necessary. > > Nicolas > > > 2011/5/2 Eliot Miranda eliot.miranda@gmail.com: >> >> Hi Henrik, >> >> Eliot (phone) >> >> On May 2, 2011, at 3:12 AM, Henrik Sperre Johansen henrik.s.johansen@veloxit.no wrote: >> >>> As per my comment on the recent Morphic performance graphs,(http://blog.openinworld.com/2011/03/morphic-flavour-performance/#comment-59) on a Cog VM these primitives fail with a Float parameter, which leads to a huge performance hit when comparing Ints to Floats by doing silly things in the fallback code. >>> >>> Cog: >>> [1 to: 2000000 do: [:i | >>> i < 2354.234. >>> i <= 2354.234. >>> i >= 2354.234. >>> i > 2354.234.]] timeToRun 26594 >>> >>> Trunk: >>> [1 to: 2000000 do: [:i | >>> i < 2354.234. >>> i <= 2354.234. >>> i >= 2354.234. >>> i > 2354.234.]] timeToRun 229 >>> >>> Should this be changed in the VM, >> >> I think so. When I first wrote the SmallInteger translated prims I hadn't yet written the JIT support for floats and so couldn't write int/float comparison. Now all the support is there and it should be straight-forward. Do you fancy trying to write this yourself? Would be a fun exercise. Let me know and I can hand-hold with using the simulator. >> >> >> Best >> Eliot (phone) >> >> >>> or is the difference for pure int/int comparition large enough that we should instead change >>> Float>>adaptToInteger: andCompare: >>> to use the corresponding comparitions with Float as receiver, which does work? >>> That's still slower than trunk in my image, but quite a bit better, above test takes about 1s. >>> >>> Cheers, >>> Henry >> >
-- Best regards, Igor Stasenko AKA sig.
On 4 May 2011 09:47, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
As always, engineers will give you negative feedback, especially when specialized in system control ;) So I want to rectify the impression I may have given and thank Eliot and Teleplace for this work. Locating and changing the code was possible relatively quickly (1h), even fixing my own bugs (1h30) thanks to immediate VM crash ;). IMO this was possible thanks to Smalltalk IDE, and I would not bet on such efficiency for modifying a C/C++ VM. I consider this exercize as a proof of concept. But maybe I'm biased by too many years of Smalltalk, some other folks should try and fix such little details to confirm.
You're just confirmed my previous observations about coding VM in smalltalk (when i were hacking Hydra). Yes, it is quite special area, but it is still relatively easy to get in, play with it and get results, without spending weeks studying APIs and classes before you even attempt to solve something. This is why i love smalltalk: it allows you to do magics without too deep knowledge of everything.
Nicolas
Nicolas, i placed your code at issue #41
http://code.google.com/p/cog/issues/detail?id=41
please check if i uploaded a correct version of code.
The change is quite simple, and most probably should work :) But i didn't tested it yet.
vm-dev@lists.squeakfoundation.org