As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
---------------------------------
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.
Best regards Stefan
Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.
Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...
If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.
I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).
Nicolas
2012/8/29 Stefan Marr smalltalk@stefan-marr.de:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
For the (1<<63) negated bug itself, one very simple solution would be to just refuse it as a valid int64... We spent many effort to handle it in #signed64BitValueOf: , and our rewards are many bug popping out in the primitives where it is used. Regarding efficiency, we will have to protect code with inefficient-UB-broken defensive if or very-inefficient-portable-C, so the best choice is to just filter it out right at the beginning... I also wonder if handling a sign-magnitude wouldn't just be easier in that case (except maybe for + and -).
Of course the other UB are remaining, but one thing at a time.
Nicolas
2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.
Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...
If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.
I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).
Nicolas
2012/8/29 Stefan Marr smalltalk@stefan-marr.de:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
I checked, unsigned int overflow behaviour is a well defined standard in both C/C++ and result in a modulo 2^ (sizeof( uint_type ) *CHAR_BIT) So I'm now convinced that sign/magnitude decomposition is the way to go. Anyway, you can currently observe sign dissertation to check overflow in post condition, so we already pay the same price as sign/magnitude solution, except that code is currently relying on broken C signed arithmetic model.
I may post corrected primitives for basic arithmetic ops when I have time,.. But I won't have any frustration if a true VM hacker or someone more available than me could do it, I don't even know our own little name for an unsigned int 64, usqLong ?
Nicolas
2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
For the (1<<63) negated bug itself, one very simple solution would be to just refuse it as a valid int64... We spent many effort to handle it in #signed64BitValueOf: , and our rewards are many bug popping out in the primitives where it is used. Regarding efficiency, we will have to protect code with inefficient-UB-broken defensive if or very-inefficient-portable-C, so the best choice is to just filter it out right at the beginning... I also wonder if handling a sign-magnitude wouldn't just be easier in that case (except maybe for + and -).
Of course the other UB are remaining, but one thing at a time.
Nicolas
2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.
Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...
If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.
I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).
Nicolas
2012/8/29 Stefan Marr smalltalk@stefan-marr.de:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
Here is an exemple of what I propose:
primitiveAddLargeIntegers "Primitive arithmetic operations for large integers in 64 bit range" | a b result oopResult aIsNegative bIsNegative resultIsNegative oopArg oopRcvr | <export: true> <var: 'a' type: 'usqLong'> <var: 'b' type: 'usqLong'> <var: 'result' type: 'usqLong'> <var: 'oopArg' type: 'oop'> <var: 'oopRcvr' type: 'oop'> <var: 'oopResult' type: 'oop'> <var: 'aIsNegative' type: 'usqInt'> <var: 'bIsNegative' type: 'usqInt'> <var: 'resultIsNegative' type: 'usqInt'>
oopArg := self stackValue: 0. oopRcvr := self stackValue: 1. aIsNegative := self isNegativeIntegerValueOf: oopRcvr. bIsNegative := self isNegativeIntegerValueOf: oopArg. a := self magnitude64BitValueOf: oopRcvr. b := self magnitude64BitValueOf: oopArg. successFlag ifFalse:[^nil]. (aIsNegative == bIsNegative) ifTrue: ["Protect against overflow" a > (16rFFFFFFFFFFFFFFFF - b) ifTrue: [self primitiveFail. ^nil]. result := a + b. resultIsNegative := aIsNegative] ifFalse: [(a >= b) ifTrue: [result := a - b. resultIsNegative := aIsNegative] ifFalse: [result := b - a. resultIsNegative := bIsNegative]]. oopResult := self magnitude64BitIntegerFor: result neg: resultIsNegative. successFlag ifTrue:[self pop: 2 thenPush: oopResult].
IMHO, code is not much worse than previous one:
primitiveAddLargeIntegers "Primitive arithmetic operations for large integers in 64 bit range" | integerRcvr integerArg result oopResult | <export: true> <var: 'integerRcvr' type: 'sqLong'> <var: 'integerArg' type: 'sqLong'> <var: 'result' type: 'sqLong'>
integerArg := self signed64BitValueOf: (self stackValue: 0). integerRcvr := self signed64BitValueOf: (self stackValue: 1). successFlag ifFalse:[^nil].
"Compute the preliminary result (which may overflow)" result := integerRcvr + integerArg.
"Now check overflow conditions. First is whether rcvr and arg are of the same sign. If they are we need to check for overflow more carefully." (integerRcvr bitXor: integerArg) < 0 ifFalse:[ "Second is whether rcvr and result are of the same sign. If not, we have an overflow." (integerRcvr bitXor: result) < 0 ifTrue:[self primitiveFail]]. successFlag ifFalse:[^nil].
oopResult := self signed64BitIntegerFor: result. successFlag ifTrue:[self pop: 2 thenPush: oopResult].
And conversion with sign/magnitude is simpler...
isNegativeIntegerValueOf: oop "Answer 1 if integer object is negative, 0 otherwise. Fail if object pointed by oop i not an integer." | largeClass smallInt | <returnTypeC: #usqInt> <var: #smallInt type: #sqInt> (self isIntegerObject: oop) ifTrue: [smallInt := self integerValueOf: oop. ^smallInt < 0]. largeClass := self fetchClassOfNonInt: oop. largeClass = self classLargePositiveInteger ifTrue: [^0]. largeClass = self classLargePositiveInteger ifTrue: [^0]. ^self primitiveFail magnitude64BitValueOf: oop "Convert the given object into an integer value. The object may be either a ST integer or a eight-byte LargeInteger." | sz value largeClass szsqLong smallIntValue | <inline: false> <returnTypeC: #usqLong> <var: #value type: #usqLong> <var: #smallIntValue type: #sqInt> (self isIntegerObject: oop) ifTrue: [smallIntValue := self integerValueOf: oop. smallIntValue < 0 ifTrue: [smallIntValue := 0 - smallIntValue]. ^self cCoerce: smallIntValue to: #usqLong]. largeClass := self fetchClassOfNonInt: oop. largeClass = self classLargePositiveInteger ifFalse:[largeClass = self classLargeNegativeInteger ifFalse:[^self primitiveFail]]. szsqLong := self sizeof: #usqLong asSymbol. sz := self lengthOf: oop. sz > szsqLong ifTrue: [^ self primitiveFail]. value := 0. 0 to: sz - 1 do: [:i | value := value + ((self cCoerce: (self fetchByte: i ofObject: oop) to: #usqLong) << (i*8))]. ^value
magnitude64BitIntegerFor: integerValue neg: isNegative "Return a Large Integer object for the given integer magnitude and sign" | newLargeInteger largeClass intValue highWord sz isSmall smallVal | <inline: false> <var: 'integerValue' type: 'usqLong'> <var: 'isNegative' type: 'usqInt'> <var: 'highWord' type: 'usqInt'> <var: 'isSmall' type: 'usqInt'> <var: 'smallVal' type: 'sqInt'>
isSmall := isNegative ifTrue: [integerValue <= 16r40000000] ifFalse: [integerValue < 16r40000000]. isSmall ifTrue: [smallVal := self cCoerce: integerValue to: #sqInt. isNegative ifTrue: [smallVal := 0 - smallVal]. ^self integerObjectOf: smallVal]. isNegative < 0 ifTrue:[ largeClass := self classLargeNegativeInteger] ifFalse:[ largeClass := self classLargePositiveInteger]. highWord := self cCode: 'magnitude >> 32'. "shift is coerced to usqInt otherwise" highWord = 0 ifTrue: [sz := 4] ifFalse:[ sz := 5. (highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1]. (highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1]. (highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1]. ]. newLargeInteger := self instantiateClass: largeClass indexableSize: sz. 0 to: sz-1 do: [:i | intValue := self cCode: '(magnitude >> (i * 8)) & 255'. self storeByte: i ofObject: newLargeInteger withValue: intValue]. ^ newLargeInteger
Nicolas
2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
I checked, unsigned int overflow behaviour is a well defined standard in both C/C++ and result in a modulo 2^ (sizeof( uint_type ) *CHAR_BIT) So I'm now convinced that sign/magnitude decomposition is the way to go. Anyway, you can currently observe sign dissertation to check overflow in post condition, so we already pay the same price as sign/magnitude solution, except that code is currently relying on broken C signed arithmetic model.
I may post corrected primitives for basic arithmetic ops when I have time,.. But I won't have any frustration if a true VM hacker or someone more available than me could do it, I don't even know our own little name for an unsigned int 64, usqLong ?
Nicolas
2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
For the (1<<63) negated bug itself, one very simple solution would be to just refuse it as a valid int64... We spent many effort to handle it in #signed64BitValueOf: , and our rewards are many bug popping out in the primitives where it is used. Regarding efficiency, we will have to protect code with inefficient-UB-broken defensive if or very-inefficient-portable-C, so the best choice is to just filter it out right at the beginning... I also wonder if handling a sign-magnitude wouldn't just be easier in that case (except maybe for + and -).
Of course the other UB are remaining, but one thing at a time.
Nicolas
2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.
Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...
If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.
I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).
Nicolas
2012/8/29 Stefan Marr smalltalk@stefan-marr.de:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.
Dave
2012/8/30 David T. Lewis lewis@mail.msen.com:
On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.
Dave
For LargeInt, I prefer proper sign/unsigned magnitude handling to signed arithmetic hacks for several reasons:
With signed arithmetic: 1) in conversion to/from LargeInt we have to discuss sign anyway and we have that magnitude fits into INT_MIN to INT_MAX 2) in overflow post-condition test, we have to discuss sign again - for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers 3) we still need defensive protection against (0 - INT_MIN) 4) all casts rely on 2-complement representation which is implementation defined (though universal nowadays)
with sign/magnitude: 1) conversion from/to is simplified 2) overflow are easy to test in pre-condition like a > (UINT_MAX - b) for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even with a bogus compiler, and as efficient as broken post-condition tests 3) no protection needed against INT_MIN 4) no reliance on 2-complement at all 5) primitives work 1 bit further 6) code is simpler (less if)
The only thing missing in my proposition (see attachment in http://code.google.com/p/cog/issues/detail?id=92) is a function retrieving sign and magnitude in a single function call (it's easy in C, but I don't know slang enough, how to return a struct or pass a pointer is beyond my knowledge).
In a word, sign/magnitude is easier/cleaner and not less performant (once single function call is resolved)
For SmallInt, computing result in signed 64 bits as proposed by Stefan is the simplest thing we can do IMHO. I'm quite sure it won't ruin efficiency.
Nicolas
Also note that I introduced one future-flaw by hardcoding a 64 bits constant...
1) 16rFFFFFFFFFFFFFFFF should be replace with a proper macro (USQLONG_MAX or something) in pre-condition tests
I did not correct the other flaws already spreaded in code: - SmallInteger maxVal/linVal are hardcoded everywhere And we have these implicit requirements: - SmallInteger maxVal/minVal has to fit in a sqInt - (0 - SmallInteger minVal) has to fit in a sqInt too (currently possible, it has only 31 bits)
2012/8/30 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
2012/8/30 David T. Lewis lewis@mail.msen.com:
On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.
Dave
For LargeInt, I prefer proper sign/unsigned magnitude handling to signed arithmetic hacks for several reasons:
With signed arithmetic:
- in conversion to/from LargeInt we have to discuss sign anyway and
we have that magnitude fits into INT_MIN to INT_MAX 2) in overflow post-condition test, we have to discuss sign again - for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers 3) we still need defensive protection against (0 - INT_MIN) 4) all casts rely on 2-complement representation which is implementation defined (though universal nowadays)
with sign/magnitude:
- conversion from/to is simplified
- overflow are easy to test in pre-condition like a > (UINT_MAX - b)
for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even with a bogus compiler, and as efficient as broken post-condition tests 3) no protection needed against INT_MIN 4) no reliance on 2-complement at all 5) primitives work 1 bit further 6) code is simpler (less if)
The only thing missing in my proposition (see attachment in http://code.google.com/p/cog/issues/detail?id=92) is a function retrieving sign and magnitude in a single function call (it's easy in C, but I don't know slang enough, how to return a struct or pass a pointer is beyond my knowledge).
In a word, sign/magnitude is easier/cleaner and not less performant (once single function call is resolved)
For SmallInt, computing result in signed 64 bits as proposed by Stefan is the simplest thing we can do IMHO. I'm quite sure it won't ruin efficiency.
Nicolas
Oops, there is another one hardcoded constant in bitShift (was already there)
shift >= 64 should better be shift >= sizeof(usqLong)*CHAR_BIT...
Nicolas
2012/8/30 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Also note that I introduced one future-flaw by hardcoding a 64 bits constant...
- 16rFFFFFFFFFFFFFFFF should be replace with a proper macro
(USQLONG_MAX or something) in pre-condition tests
I did not correct the other flaws already spreaded in code:
- SmallInteger maxVal/linVal are hardcoded everywhere
And we have these implicit requirements:
- SmallInteger maxVal/minVal has to fit in a sqInt
- (0 - SmallInteger minVal) has to fit in a sqInt too (currently
possible, it has only 31 bits)
2012/8/30 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
2012/8/30 David T. Lewis lewis@mail.msen.com:
On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:
Hi Nicolas:
On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).
See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?
A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.
Dave
For LargeInt, I prefer proper sign/unsigned magnitude handling to signed arithmetic hacks for several reasons:
With signed arithmetic:
- in conversion to/from LargeInt we have to discuss sign anyway and
we have that magnitude fits into INT_MIN to INT_MAX 2) in overflow post-condition test, we have to discuss sign again - for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers 3) we still need defensive protection against (0 - INT_MIN) 4) all casts rely on 2-complement representation which is implementation defined (though universal nowadays)
with sign/magnitude:
- conversion from/to is simplified
- overflow are easy to test in pre-condition like a > (UINT_MAX - b)
for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even with a bogus compiler, and as efficient as broken post-condition tests 3) no protection needed against INT_MIN 4) no reliance on 2-complement at all 5) primitives work 1 bit further 6) code is simpler (less if)
The only thing missing in my proposition (see attachment in http://code.google.com/p/cog/issues/detail?id=92) is a function retrieving sign and magnitude in a single function call (it's easy in C, but I don't know slang enough, how to return a struct or pass a pointer is beyond my knowledge).
In a word, sign/magnitude is easier/cleaner and not less performant (once single function call is resolved)
For SmallInt, computing result in signed 64 bits as proposed by Stefan is the simplest thing we can do IMHO. I'm quite sure it won't ruin efficiency.
Nicolas
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
Nicolas
2012/8/30 David T. Lewis lewis@mail.msen.com:
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
Hi:
On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.
Might be something necessary to be considered.
Best regards Stefan
2012/8/30 Stefan Marr smalltalk@stefan-marr.de:
Hi:
On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.
Might be something necessary to be considered.
Best regards Stefan
Interesting. My fix for LargeInt seem to give same or better timing than old bogus ones despite double function call for getting sign and magnitude. Note that I updated http://code.google.com/p/cog/issues/detail?id=92 with a corrected version. 1st one bogusly tried to use oop as a type, I obviously don't program VM every day ;)
Nicolas
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
Hi Stefan,
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :
bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...
Then I get mini bench timing:
ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'
MODIFIED: '8,410,000 per second.' '8,370,000 per second.'
Though, I didn't used SSE or any 64 bits friendly instructions:
line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)
So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.
Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)
Nicolas
2012/8/30 Stefan Marr smalltalk@stefan-marr.de:
Hi:
On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.
Might be something necessary to be considered.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
Hi Nicolas:
On 01 Sep 2012, at 22:15, Nicolas Cellier wrote:
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20%
Sorry, the 20% claim is based on wrong assumptions... In short, I don't know what the penalty is on microbenchmarks.
I was only checking the benchmark results here: http://soft.vub.ac.be/~ppp/codespeed/changes/?tre=5&rev=5e18a83ed285c900... The corresponding patch is here: https://github.com/smarr/RoarVM/commit/5e18a83ed285c900aacd4190f91df6627d9cc...
As you can see, the IntLoop benchmark is taking a pretty strong hit. I was not cross-checking what the benchmark actually does, unfortunately. I was just assuming it does multiplication, which would explain the hit, but it does only do subtraction.
So, actually, I don't know. The benchmark is not measuring that primitive, and the other benchmarks are looking ok (taking measurement errors into account).
Still wonder what's wrong with the IntLoop benchmark thought. 20% for perhaps a new offset in the binary for related code, sounds like a lot.
Best regards Stefan
OK, see also http://code.google.com/p/cog/issues/detail?id=93
2012/9/1 Stefan Marr smalltalk@stefan-marr.de:
Hi Nicolas:
On 01 Sep 2012, at 22:15, Nicolas Cellier wrote:
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20%
Sorry, the 20% claim is based on wrong assumptions... In short, I don't know what the penalty is on microbenchmarks.
I was only checking the benchmark results here: http://soft.vub.ac.be/~ppp/codespeed/changes/?tre=5&rev=5e18a83ed285c900... The corresponding patch is here: https://github.com/smarr/RoarVM/commit/5e18a83ed285c900aacd4190f91df6627d9cc...
As you can see, the IntLoop benchmark is taking a pretty strong hit. I was not cross-checking what the benchmark actually does, unfortunately. I was just assuming it does multiplication, which would explain the hit, but it does only do subtraction.
So, actually, I don't know. The benchmark is not measuring that primitive, and the other benchmarks are looking ok (taking measurement errors into account).
Still wonder what's wrong with the IntLoop benchmark thought. 20% for perhaps a new offset in the binary for related code, sounds like a lot.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
On Sat, 1 Sep 2012, Nicolas Cellier wrote:
Hi Stefan,
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :
bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...
Then I get mini bench timing:
ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'
MODIFIED: '8,410,000 per second.' '8,370,000 per second.'
IMHO #bench has too high overhead for accurately measuring such simple operations. With a CogVM I get this:
[ 33 * 35 ] bench '40,000,000 per second.'. [] bench '47,300,000 per second.'
Levente
Though, I didn't used SSE or any 64 bits friendly instructions:
line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)
So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.
Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)
Nicolas
2012/8/30 Stefan Marr smalltalk@stefan-marr.de:
Hi:
On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.
Might be something necessary to be considered.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
Yes benchmarking is an art, this was a gross aproximation
2012/9/1 Levente Uzonyi leves@elte.hu:
On Sat, 1 Sep 2012, Nicolas Cellier wrote:
Hi Stefan,
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :
bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...
Then I get mini bench timing:
ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'
MODIFIED: '8,410,000 per second.' '8,370,000 per second.'
IMHO #bench has too high overhead for accurately measuring such simple operations. With a CogVM I get this:
[ 33 * 35 ] bench '40,000,000 per second.'. [] bench '47,300,000 per second.'
Levente
Though, I didn't used SSE or any 64 bits friendly instructions:
line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)
So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.
Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)
Nicolas
2012/8/30 Stefan Marr smalltalk@stefan-marr.de:
Hi:
On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.
Might be something necessary to be considered.
Best regards Stefan
-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:
testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
Before Nicolas' changes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544
0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'
Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657
After applying the fixes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577
0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'
Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679
Dave
On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
Nicolas
2012/8/30 David T. Lewis lewis@mail.msen.com:
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
On Sat, 1 Sep 2012, David T. Lewis wrote:
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.
That sounds great, thanks Dave.
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:
testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.
Levente
Before Nicolas' changes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544
0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'
Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657
After applying the fixes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577
0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'
Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679
Dave
On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
Nicolas
2012/8/30 David T. Lewis lewis@mail.msen.com:
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
i wonder, if any changes are needed for jit, because i suppose some of semantics for bigints is jitted.
On 1 September 2012 16:44, Levente Uzonyi leves@elte.hu wrote:
On Sat, 1 Sep 2012, David T. Lewis wrote:
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.
That sounds great, thanks Dave.
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:
testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.
Levente
Before Nicolas' changes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544
0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'
Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657
After applying the fixes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577
0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'
Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679
Dave
On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
Nicolas
2012/8/30 David T. Lewis lewis@mail.msen.com:
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:
As posted on squeak-dev
http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example
http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read
http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like
https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
I'm not sure, but I would not expect any differences for jit. The changes here are in the primitives themselves, so I would not expect any impact on jitted methods.
It would be good to take a similar measurement with Cog of course. If there is any real difference in the primitive performance due to these changes, it should be easier to measure with Cog because relatively less time would be spent in the method execution as opposed to the primitives themselves.
Dave
On Sat, Sep 01, 2012 at 05:41:44PM +0200, Igor Stasenko wrote:
i wonder, if any changes are needed for jit, because i suppose some of semantics for bigints is jitted.
On 1 September 2012 16:44, Levente Uzonyi leves@elte.hu wrote:
On Sat, 1 Sep 2012, David T. Lewis wrote:
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.
That sounds great, thanks Dave.
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:
testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.
Levente
Before Nicolas' changes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544
0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'
Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657
After applying the fixes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577
0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'
Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679
Dave
On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
Nicolas
2012/8/30 David T. Lewis lewis@mail.msen.com:
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:
As posted on squeak-dev
http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives
(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.
They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example
http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000
Surprisingly this one works: (1<<63) negated // -1.
Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...
I invite every VM hacker to read
http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like
https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...
For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)
or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...
OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).
Do it right > do it fast.
Nicolas
-- Best regards, Igor Stasenko.
i am busy with other stuff right now, but i will do a pass on this issue later.
On 1 September 2012 17:52, David T. Lewis lewis@mail.msen.com wrote:
I'm not sure, but I would not expect any differences for jit. The changes here are in the primitives themselves, so I would not expect any impact on jitted methods.
It would be good to take a similar measurement with Cog of course. If there is any real difference in the primitive performance due to these changes, it should be easier to measure with Cog because relatively less time would be spent in the method execution as opposed to the primitives themselves.
Dave
On Sat, Sep 01, 2012 at 05:41:44PM +0200, Igor Stasenko wrote:
i wonder, if any changes are needed for jit, because i suppose some of semantics for bigints is jitted.
On 1 September 2012 16:44, Levente Uzonyi leves@elte.hu wrote:
On Sat, 1 Sep 2012, David T. Lewis wrote:
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.
That sounds great, thanks Dave.
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:
testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.
Levente
Before Nicolas' changes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544
0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'
Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657
After applying the fixes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577
0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'
Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679
Dave
On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !
As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)
Nicolas
2012/8/30 David T. Lewis lewis@mail.msen.com:
This is on Mantis at http://bugs.squeak.org/view.php?id=7705
Note last comment in related issue 6987.
This issue will crash the VM when compiled for 64-bit platforms.
Dave
On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote: > > > As posted on squeak-dev > > http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... > I found 3 bugs in LargeInteger primitives > > (1<<63) negated quo: -1. > (1<<63) negated / -1. > (1<<63) negated * -1. > > They are all related to the impossible task of taking absolute value > of INT_MIN (or more exactly it's 64 bits equivalent). > Currently, it takes the form (0 - INT_MIN) whose behaviour is > undefined according to C standards but generally answer INT_MIN. > See for example > > http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000 > > Surprisingly this one works: > (1<<63) negated // -1. > > Most probably because gcc has a license to ignore undefined behaviour > and perform some optimizations that don't take overflow side effects > into account. > For example 0 - (0 - a)/b can be simplified into a/b, UB case of > INT_MIN apart... > > --------------------------------- > > Beside these bugs, when I read the code, I'm quite sure it's a nest of > future bugs because there are many other attempts to catch overflow in > post-condition (like testing that addition of two positive is negative > when an underflow occurs) that technically rely on explicitely > Undefined Behaviour (UB). > OK, by now many Arithmetic Units do behave like exploited in these > post-conditions, though it's not strictly future-proof. > But we unfortunately rely on optimizing C compilers, and its behaviour > is much more fragile than hardware... > > I invite every VM hacker to read > > http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... > And various links like > > https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure... > > For example, in large integer subtract, we have a protection against > (0 - INT_MIN) like: > y = 0 - x; > if ( y==x ) { primitiveFail(); } > an optimizing compiler having a licence to ignore INT_MIN Undefined > Behaviour case could mathematically solve the equation as x==0, y==0 > and transform code into > if( ! (y=0-x) ) { primitiveFail(); } > (directly use a jz and save a comparison) > > or if we have such branch > c = a + b; > if( a >0) { > if(b > 0) { > if (c < 0 ) { primitiveFail(); } > } > } > Again, a good compiler could remove the if( c < 0) test, since it does > not have to care about UB... > > OK, pragmatically, most of these post-condition hacks are fast and > work with some version of gcc, but think about portability (llvm ?) > and future pain (you can debug such code only at asembler level). > > Do it right > do it fast. > > Nicolas
-- Best regards, Igor Stasenko.
On Sat, Sep 01, 2012 at 04:44:21PM +0200, Levente Uzonyi wrote:
On Sat, 1 Sep 2012, David T. Lewis wrote:
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.
That sounds great, thanks Dave.
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:
testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
What's the reason for using #to: and #do: instead of #to:do:?
Ignorance on my part.
The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.
Repeating the measurements using #to:do: below. After applying Nicolas' fixes, there does appear to be a slight improvement in large integer performance as measured by the testBlock, and a slight decrease in bytecodes/sec.
I cannot explain either of these performance variations. I don't see any reason that the changes to the primitives should affect bytecode performance, but I'm attaching a diff of the generated interp.c files before and after the changes in case anyone wants to have a look.
Test results are:
testBlock := [1 to: 1000000 do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].
Before Nicolas' changes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: testBlock ==> 4597 Time millisecondsToRun: testBlock ==> 4592 Time millisecondsToRun: testBlock ==> 4602 Time millisecondsToRun: testBlock ==> 4605 Time millisecondsToRun: testBlock ==> 4627
0 tinyBenchmarks ==> '424192212 bytecodes/sec; 14451784 sends/sec' 0 tinyBenchmarks ==> '424543946 bytecodes/sec; 15114601 sends/sec' 0 tinyBenchmarks ==> '421746293 bytecodes/sec; 14451784 sends/sec' 0 tinyBenchmarks ==> '423490488 bytecodes/sec; 15114601 sends/sec' 0 tinyBenchmarks ==> '423490488 bytecodes/sec; 14463110 sends/sec'
After, with fixes:
(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
Time millisecondsToRun: testBlock ==> 4416 Time millisecondsToRun: testBlock ==> 4432 Time millisecondsToRun: testBlock ==> 4410 Time millisecondsToRun: testBlock ==> 4400 Time millisecondsToRun: testBlock ==> 4440
0 tinyBenchmarks ==> '438731790 bytecodes/sec; 15028443 sends/sec' 0 tinyBenchmarks ==> '442906574 bytecodes/sec; 14775763 sends/sec' 0 tinyBenchmarks ==> '440619621 bytecodes/sec; 14752141 sends/sec' 0 tinyBenchmarks ==> '436115843 bytecodes/sec; 14775763 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14763943 sends/sec'
Dave
On Sat, 1 Sep 2012, David T. Lewis wrote:
Can't quote your mail, sorry. But according to your numbers, there's improvement in both the testBlock and the sends/sec.
Levente
On Sat, Sep 01, 2012 at 09:44:29PM +0200, Levente Uzonyi wrote:
On Sat, 1 Sep 2012, David T. Lewis wrote:
Can't quote your mail, sorry. But according to your numbers, there's improvement in both the testBlock and the sends/sec.
D'oh! Brain fart. So yes, it's good news all around.
I still cannot explain *why* the performance is different (better), but I'm happy to give credit to Nicolas for the improvements regardless :)
Dave
vm-dev@lists.squeakfoundation.org