3 Bugs in LargeInteger primitives

List overview All Threads
Download

newer

older

[Cog] issue tracker -> vm-dev...

VM Maker: VMMaker-dtl.286.mcz

Nicolas Cellier

29 Aug 2012 29 Aug '12

10:18 a.m.

As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

---------------------------------

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

Show replies by date

Stefan Marr

29 Aug 29 Aug

11:24 a.m.

Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

11:50 a.m.

Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.

Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...

If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.

I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).

Nicolas

2012/8/29 Stefan Marr smalltalk@stefan-marr.de:

...

Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

12:01 p.m.

For the (1<<63) negated bug itself, one very simple solution would be to just refuse it as a valid int64... We spent many effort to handle it in #signed64BitValueOf: , and our rewards are many bug popping out in the primitives where it is used. Regarding efficiency, we will have to protect code with inefficient-UB-broken defensive if or very-inefficient-portable-C, so the best choice is to just filter it out right at the beginning... I also wonder if handling a sign-magnitude wouldn't just be easier in that case (except maybe for + and -).

Of course the other UB are remaining, but one thing at a time.

Nicolas

2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.

Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...

If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.

I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).

Nicolas

2012/8/29 Stefan Marr smalltalk@stefan-marr.de:

...
Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

4:38 p.m.

I checked, unsigned int overflow behaviour is a well defined standard in both C/C++ and result in a modulo 2^ (sizeof( uint_type ) *CHAR_BIT) So I'm now convinced that sign/magnitude decomposition is the way to go. Anyway, you can currently observe sign dissertation to check overflow in post condition, so we already pay the same price as sign/magnitude solution, except that code is currently relying on broken C signed arithmetic model.

I may post corrected primitives for basic arithmetic ops when I have time,.. But I won't have any frustration if a true VM hacker or someone more available than me could do it, I don't even know our own little name for an unsigned int 64, usqLong ?

Nicolas

2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

For the (1<<63) negated bug itself, one very simple solution would be to just refuse it as a valid int64... We spent many effort to handle it in #signed64BitValueOf: , and our rewards are many bug popping out in the primitives where it is used. Regarding efficiency, we will have to protect code with inefficient-UB-broken defensive if or very-inefficient-portable-C, so the best choice is to just filter it out right at the beginning... I also wonder if handling a sign-magnitude wouldn't just be easier in that case (except maybe for + and -).

Of course the other UB are remaining, but one thing at a time.

Nicolas

2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.

Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...

If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.

I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).

Nicolas

2012/8/29 Stefan Marr smalltalk@stefan-marr.de:

...
Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

9:09 p.m.

Here is an exemple of what I propose:

primitiveAddLargeIntegers "Primitive arithmetic operations for large integers in 64 bit range" | a b result oopResult aIsNegative bIsNegative resultIsNegative oopArg oopRcvr | <export: true> <var: 'a' type: 'usqLong'> <var: 'b' type: 'usqLong'> <var: 'result' type: 'usqLong'> <var: 'oopArg' type: 'oop'> <var: 'oopRcvr' type: 'oop'> <var: 'oopResult' type: 'oop'> <var: 'aIsNegative' type: 'usqInt'> <var: 'bIsNegative' type: 'usqInt'> <var: 'resultIsNegative' type: 'usqInt'>

oopArg := self stackValue: 0. oopRcvr := self stackValue: 1. aIsNegative := self isNegativeIntegerValueOf: oopRcvr. bIsNegative := self isNegativeIntegerValueOf: oopArg. a := self magnitude64BitValueOf: oopRcvr. b := self magnitude64BitValueOf: oopArg. successFlag ifFalse:[^nil]. (aIsNegative == bIsNegative) ifTrue: ["Protect against overflow" a > (16rFFFFFFFFFFFFFFFF - b) ifTrue: [self primitiveFail. ^nil]. result := a + b. resultIsNegative := aIsNegative] ifFalse: [(a >= b) ifTrue: [result := a - b. resultIsNegative := aIsNegative] ifFalse: [result := b - a. resultIsNegative := bIsNegative]]. oopResult := self magnitude64BitIntegerFor: result neg: resultIsNegative. successFlag ifTrue:[self pop: 2 thenPush: oopResult].

IMHO, code is not much worse than previous one:

primitiveAddLargeIntegers "Primitive arithmetic operations for large integers in 64 bit range" | integerRcvr integerArg result oopResult | <export: true> <var: 'integerRcvr' type: 'sqLong'> <var: 'integerArg' type: 'sqLong'> <var: 'result' type: 'sqLong'>

integerArg := self signed64BitValueOf: (self stackValue: 0). integerRcvr := self signed64BitValueOf: (self stackValue: 1). successFlag ifFalse:[^nil].

"Compute the preliminary result (which may overflow)" result := integerRcvr + integerArg.

"Now check overflow conditions. First is whether rcvr and arg are of the same sign. If they are we need to check for overflow more carefully." (integerRcvr bitXor: integerArg) < 0 ifFalse:[ "Second is whether rcvr and result are of the same sign. If not, we have an overflow." (integerRcvr bitXor: result) < 0 ifTrue:[self primitiveFail]]. successFlag ifFalse:[^nil].

oopResult := self signed64BitIntegerFor: result. successFlag ifTrue:[self pop: 2 thenPush: oopResult].

And conversion with sign/magnitude is simpler...

isNegativeIntegerValueOf: oop "Answer 1 if integer object is negative, 0 otherwise. Fail if object pointed by oop i not an integer." | largeClass smallInt | <returnTypeC: #usqInt> <var: #smallInt type: #sqInt> (self isIntegerObject: oop) ifTrue: [smallInt := self integerValueOf: oop. ^smallInt < 0]. largeClass := self fetchClassOfNonInt: oop. largeClass = self classLargePositiveInteger ifTrue: [^0]. largeClass = self classLargePositiveInteger ifTrue: [^0]. ^self primitiveFail magnitude64BitValueOf: oop "Convert the given object into an integer value. The object may be either a ST integer or a eight-byte LargeInteger." | sz value largeClass szsqLong smallIntValue | <inline: false> <returnTypeC: #usqLong> <var: #value type: #usqLong> <var: #smallIntValue type: #sqInt> (self isIntegerObject: oop) ifTrue: [smallIntValue := self integerValueOf: oop. smallIntValue < 0 ifTrue: [smallIntValue := 0 - smallIntValue]. ^self cCoerce: smallIntValue to: #usqLong]. largeClass := self fetchClassOfNonInt: oop. largeClass = self classLargePositiveInteger ifFalse:[largeClass = self classLargeNegativeInteger ifFalse:[^self primitiveFail]]. szsqLong := self sizeof: #usqLong asSymbol. sz := self lengthOf: oop. sz > szsqLong ifTrue: [^ self primitiveFail]. value := 0. 0 to: sz - 1 do: [:i | value := value + ((self cCoerce: (self fetchByte: i ofObject: oop) to: #usqLong) << (i*8))]. ^value

magnitude64BitIntegerFor: integerValue neg: isNegative "Return a Large Integer object for the given integer magnitude and sign" | newLargeInteger largeClass intValue highWord sz isSmall smallVal | <inline: false> <var: 'integerValue' type: 'usqLong'> <var: 'isNegative' type: 'usqInt'> <var: 'highWord' type: 'usqInt'> <var: 'isSmall' type: 'usqInt'> <var: 'smallVal' type: 'sqInt'>

isSmall := isNegative ifTrue: [integerValue <= 16r40000000] ifFalse: [integerValue < 16r40000000]. isSmall ifTrue: [smallVal := self cCoerce: integerValue to: #sqInt. isNegative ifTrue: [smallVal := 0 - smallVal]. ^self integerObjectOf: smallVal]. isNegative < 0 ifTrue:[ largeClass := self classLargeNegativeInteger] ifFalse:[ largeClass := self classLargePositiveInteger]. highWord := self cCode: 'magnitude >> 32'. "shift is coerced to usqInt otherwise" highWord = 0 ifTrue: [sz := 4] ifFalse:[ sz := 5. (highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1]. (highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1]. (highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1]. ]. newLargeInteger := self instantiateClass: largeClass indexableSize: sz. 0 to: sz-1 do: [:i | intValue := self cCode: '(magnitude >> (i * 8)) & 255'. self storeByte: i ofObject: newLargeInteger withValue: intValue]. ^ newLargeInteger

Nicolas

2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

I checked, unsigned int overflow behaviour is a well defined standard in both C/C++ and result in a modulo 2^ (sizeof( uint_type ) *CHAR_BIT) So I'm now convinced that sign/magnitude decomposition is the way to go. Anyway, you can currently observe sign dissertation to check overflow in post condition, so we already pay the same price as sign/magnitude solution, except that code is currently relying on broken C signed arithmetic model.

I may post corrected primitives for basic arithmetic ops when I have time,.. But I won't have any frustration if a true VM hacker or someone more available than me could do it, I don't even know our own little name for an unsigned int 64, usqLong ?

Nicolas

2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
For the (1<<63) negated bug itself, one very simple solution would be to just refuse it as a valid int64... We spent many effort to handle it in #signed64BitValueOf: , and our rewards are many bug popping out in the primitives where it is used. Regarding efficiency, we will have to protect code with inefficient-UB-broken defensive if or very-inefficient-portable-C, so the best choice is to just filter it out right at the beginning... I also wonder if handling a sign-magnitude wouldn't just be easier in that case (except maybe for + and -).

Of course the other UB are remaining, but one thing at a time.

Nicolas

2012/8/29 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
Originally, C was very close to machine instructions and was conceived as a generic assembler. But it's more and more distant and becoming abstract.

Unfortunately, the abstract arithmetic model is completely broken with UB: it's un-reliable. So what does this kind of abstraction serves? Really, it makes me wonder...

If portable C code becomes both long and inefficient, I guess that cost of maintaining assembler for some part that we know are broken is an option indeed.

I also see that Andreas had a hard time with slang, some intermediate operations being cast to uint32 instead of int64, he finally had to use many cCode: hacks... Going both thru slang and C intermediate sounds like too much work and too few safety for implementation basic arithmetic (obviously we didn't and we can't easily model broken C behaviour in Slang, it's too complex !).

Nicolas

2012/8/29 Stefan Marr smalltalk@stefan-marr.de:

...
Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

David T. Lewis

30 Aug 30 Aug

1:31 a.m.

On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:

...

Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.

...

I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.

Dave

Nicolas Cellier

8:03 a.m.

2012/8/30 David T. Lewis lewis@mail.msen.com:

...

On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:

...
Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.

...
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.

Dave

For LargeInt, I prefer proper sign/unsigned magnitude handling to signed arithmetic hacks for several reasons:

With signed arithmetic: 1) in conversion to/from LargeInt we have to discuss sign anyway and we have that magnitude fits into INT_MIN to INT_MAX 2) in overflow post-condition test, we have to discuss sign again - for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers 3) we still need defensive protection against (0 - INT_MIN) 4) all casts rely on 2-complement representation which is implementation defined (though universal nowadays)

with sign/magnitude: 1) conversion from/to is simplified 2) overflow are easy to test in pre-condition like a > (UINT_MAX - b) for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even with a bogus compiler, and as efficient as broken post-condition tests 3) no protection needed against INT_MIN 4) no reliance on 2-complement at all 5) primitives work 1 bit further 6) code is simpler (less if)

The only thing missing in my proposition (see attachment in http://code.google.com/p/cog/issues/detail?id=92) is a function retrieving sign and magnitude in a single function call (it's easy in C, but I don't know slang enough, how to return a struct or pass a pointer is beyond my knowledge).

In a word, sign/magnitude is easier/cleaner and not less performant (once single function call is resolved)

For SmallInt, computing result in signed 64 bits as proposed by Stefan is the simplest thing we can do IMHO. I'm quite sure it won't ruin efficiency.

Nicolas

Nicolas Cellier

8:24 a.m.

Also note that I introduced one future-flaw by hardcoding a 64 bits constant...

1) 16rFFFFFFFFFFFFFFFF should be replace with a proper macro (USQLONG_MAX or something) in pre-condition tests

I did not correct the other flaws already spreaded in code: - SmallInteger maxVal/linVal are hardcoded everywhere And we have these implicit requirements: - SmallInteger maxVal/minVal has to fit in a sqInt - (0 - SmallInteger minVal) has to fit in a sqInt too (currently possible, it has only 31 bits)

2012/8/30 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

2012/8/30 David T. Lewis lewis@mail.msen.com:

...
On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:

...
Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.

...
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.

Dave

For LargeInt, I prefer proper sign/unsigned magnitude handling to signed arithmetic hacks for several reasons:

With signed arithmetic:

in conversion to/from LargeInt we have to discuss sign anyway and

we have that magnitude fits into INT_MIN to INT_MAX 2) in overflow post-condition test, we have to discuss sign again - for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers 3) we still need defensive protection against (0 - INT_MIN) 4) all casts rely on 2-complement representation which is implementation defined (though universal nowadays)

with sign/magnitude:

conversion from/to is simplified

overflow are easy to test in pre-condition like a > (UINT_MAX - b)

for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even with a bogus compiler, and as efficient as broken post-condition tests 3) no protection needed against INT_MIN 4) no reliance on 2-complement at all 5) primitives work 1 bit further 6) code is simpler (less if)

The only thing missing in my proposition (see attachment in http://code.google.com/p/cog/issues/detail?id=92) is a function retrieving sign and magnitude in a single function call (it's easy in C, but I don't know slang enough, how to return a struct or pass a pointer is beyond my knowledge).

In a word, sign/magnitude is easier/cleaner and not less performant (once single function call is resolved)

For SmallInt, computing result in signed 64 bits as proposed by Stefan is the simplest thing we can do IMHO. I'm quite sure it won't ruin efficiency.

Nicolas

Nicolas Cellier

10:31 a.m.

Oops, there is another one hardcoded constant in bitShift (was already there)

shift >= 64 should better be shift >= sizeof(usqLong)*CHAR_BIT...

Nicolas

2012/8/30 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...

Also note that I introduced one future-flaw by hardcoding a 64 bits constant...

16rFFFFFFFFFFFFFFFF should be replace with a proper macro

(USQLONG_MAX or something) in pre-condition tests

I did not correct the other flaws already spreaded in code:

SmallInteger maxVal/linVal are hardcoded everywhere

And we have these implicit requirements:

SmallInteger maxVal/minVal has to fit in a sqInt

(0 - SmallInteger minVal) has to fit in a sqInt too (currently

possible, it has only 31 bits)

2012/8/30 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:

...
2012/8/30 David T. Lewis lewis@mail.msen.com:

...
On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:

...
Hi Nicolas:

On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:

...
Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB).

See below. Tests such as this are essential, and they they do *not* rely on undefined behavior if the C variables are properly declared.

...
I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html is related too. I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)

Since C is undefined in that regard, what are the options? Hand-crafted assembly for all relevant platforms? Are there libraries that abstract from these things?

A good general solution is to perform the arithmetic using variables declared as, or cast to, unsigned. The ambiguity in C language pertains only to signed twos complement arithmetic, so if the operations are performed on twos complement values that are declared unsigned, then no compiler optimization is possible and the results are unambiguous regardless of compiler behavior. Results of the unsigned operations may be tested for overflow, then cast back to signed integer if the result is intended to be interpreted as a signed integer.

Dave

For LargeInt, I prefer proper sign/unsigned magnitude handling to signed arithmetic hacks for several reasons:

With signed arithmetic:

in conversion to/from LargeInt we have to discuss sign anyway and

we have that magnitude fits into INT_MIN to INT_MAX 2) in overflow post-condition test, we have to discuss sign again - for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers 3) we still need defensive protection against (0 - INT_MIN) 4) all casts rely on 2-complement representation which is implementation defined (though universal nowadays)

with sign/magnitude:

conversion from/to is simplified

overflow are easy to test in pre-condition like a > (UINT_MAX - b)

for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even with a bogus compiler, and as efficient as broken post-condition tests 3) no protection needed against INT_MIN 4) no reliance on 2-complement at all 5) primitives work 1 bit further 6) code is simpler (less if)

The only thing missing in my proposition (see attachment in http://code.google.com/p/cog/issues/detail?id=92) is a function retrieving sign and magnitude in a single function call (it's easy in C, but I don't know slang enough, how to return a struct or pass a pointer is beyond my knowledge).

In a word, sign/magnitude is easier/cleaner and not less performant (once single function call is resolved)

For SmallInt, computing result in signed 64 bits as proposed by Stefan is the simplest thing we can do IMHO. I'm quite sure it won't ruin efficiency.

Nicolas

David T. Lewis

29 Aug 29 Aug

11:04 p.m.

This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:

...

As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

Nicolas Cellier

11:14 p.m.

See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

Nicolas

2012/8/30 David T. Lewis lewis@mail.msen.com:

...

This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:

...
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

Stefan Marr

30 Aug 30 Aug

9:31 a.m.

Hi:

On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:

...

See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.

Might be something necessary to be considered.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

8:54 p.m.

2012/8/30 Stefan Marr smalltalk@stefan-marr.de:

...

Hi:

On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.

Might be something necessary to be considered.

Best regards Stefan

Interesting. My fix for LargeInt seem to give same or better timing than old bogus ones despite double function call for getting sign and magnitude. Note that I updated http://code.google.com/p/cog/issues/detail?id=92 with a corrected version. 1st one bogusly tried to use oop as a type, I obviously don't program VM every day ;)

Nicolas

...

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

1 Sep 1 Sep

8:15 p.m.

Hi Stefan,

I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :

bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...

Then I get mini bench timing:

ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'

MODIFIED: '8,410,000 per second.' '8,370,000 per second.'

Though, I didn't used SSE or any 64 bits friendly instructions:

line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)

So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.

Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)

Nicolas

2012/8/30 Stefan Marr smalltalk@stefan-marr.de:

...

Hi:

On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.

Might be something necessary to be considered.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Stefan Marr

8:49 p.m.

Hi Nicolas:

On 01 Sep 2012, at 22:15, Nicolas Cellier wrote:

...

I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20%

Sorry, the 20% claim is based on wrong assumptions... In short, I don't know what the penalty is on microbenchmarks.

I was only checking the benchmark results here: http://soft.vub.ac.be/~ppp/codespeed/changes/?tre=5&rev=5e18a83ed285c900... The corresponding patch is here: https://github.com/smarr/RoarVM/commit/5e18a83ed285c900aacd4190f91df6627d9cc...

As you can see, the IntLoop benchmark is taking a pretty strong hit. I was not cross-checking what the benchmark actually does, unfortunately. I was just assuming it does multiplication, which would explain the hit, but it does only do subtraction.

So, actually, I don't know. The benchmark is not measuring that primitive, and the other benchmarks are looking ok (taking measurement errors into account).

Still wonder what's wrong with the IntLoop benchmark thought. 20% for perhaps a new offset in the binary for related code, sounds like a lot.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

9:54 p.m.

2012/9/1 Stefan Marr smalltalk@stefan-marr.de:

...

Hi Nicolas:

On 01 Sep 2012, at 22:15, Nicolas Cellier wrote:

...
I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20%

Sorry, the 20% claim is based on wrong assumptions... In short, I don't know what the penalty is on microbenchmarks.

I was only checking the benchmark results here: http://soft.vub.ac.be/~ppp/codespeed/changes/?tre=5&rev=5e18a83ed285c900... The corresponding patch is here: https://github.com/smarr/RoarVM/commit/5e18a83ed285c900aacd4190f91df6627d9cc...

As you can see, the IntLoop benchmark is taking a pretty strong hit. I was not cross-checking what the benchmark actually does, unfortunately. I was just assuming it does multiplication, which would explain the hit, but it does only do subtraction.

So, actually, I don't know. The benchmark is not measuring that primitive, and the other benchmarks are looking ok (taking measurement errors into account).

Still wonder what's wrong with the IntLoop benchmark thought. 20% for perhaps a new offset in the binary for related code, sounds like a lot.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Levente Uzonyi

9:25 p.m.

On Sat, 1 Sep 2012, Nicolas Cellier wrote:

...

Hi Stefan,

I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :

bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...

Then I get mini bench timing:

ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'

MODIFIED: '8,410,000 per second.' '8,370,000 per second.'

IMHO #bench has too high overhead for accurately measuring such simple operations. With a CogVM I get this:

[ 33 * 35 ] bench '40,000,000 per second.'. [] bench '47,300,000 per second.'

Levente

...

Though, I didn't used SSE or any 64 bits friendly instructions:

line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)

So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.

Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)

Nicolas

2012/8/30 Stefan Marr smalltalk@stefan-marr.de:

...
Hi:

On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.

Might be something necessary to be considered.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

Nicolas Cellier

9:55 p.m.

Yes benchmarking is an art, this was a gross aproximation

2012/9/1 Levente Uzonyi leves@elte.hu:

...

On Sat, 1 Sep 2012, Nicolas Cellier wrote:

...
Hi Stefan,

I just tried on a stack VM (MacOSX core 2...), and I get different measurements, at most 3% penalty, not 20% If I rewrite bytecodePrimMultiply like this :

bytecodePrimMultiply | rcvr arg result | <var: #result type: 'sqLong'> rcvr := self internalStackValue: 1. arg := self internalStackValue: 0. (self areIntegers: rcvr and: arg) ifTrue: [rcvr := objectMemory integerValueOf: rcvr. arg := objectMemory integerValueOf: arg. result := rcvr. result := result * arg. (result >= 16r-40000000 and: [result <= 16r3FFFFFFF]) ifTrue: [self internalPop: 2 thenPush: (objectMemory integerObjectOf: result). ^self fetchNextBytecode "success"]] ifFalse: [...

Then I get mini bench timing:

ORIGINAL: [33*35] bench '8,470,000 per second.' '8,670,000 per second.'

MODIFIED: '8,410,000 per second.' '8,370,000 per second.'

IMHO #bench has too high overhead for accurately measuring such simple operations. With a CogVM I get this:

[ 33 * 35 ] bench '40,000,000 per second.'. [] bench '47,300,000 per second.'

Levente

...
Though, I didn't used SSE or any 64 bits friendly instructions:

line 6501 result = result * arg; .loc 1 6501 0 movl -1404(%ebp), %eax movl %eax, %edx sarl $31, %edx movl -2068(%ebp), %ecx imull %eax, %ecx movl -2072(%ebp), %ebx imull %edx, %ebx addl %ebx, %ecx mull -2072(%ebp) addl %edx, %ecx movl %ecx, %edx movl %eax, -2072(%ebp) movl %edx, -2068(%ebp) movl %eax, -2072(%ebp) movl %edx, -2068(%ebp)

So it certainly is un-optimal, but I'm sure we wouldn't notice any difference on a macro benchmark.

Also, we could let the primitive use create a #signed64BitIntegerFor: result instead of falling back to normal send if we really want to favour performance over clean separation (after all, the primitive already know about Float, why not about LargeInteger...)

Nicolas

2012/8/30 Stefan Marr smalltalk@stefan-marr.de:

...
Hi:

On 30 Aug 2012, at 01:14, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

This might be the simplest solution, but at least on the RoarVM I measured a significant performance impact on tight integer loops. It's 20% according to my measurements.

Might be something necessary to be considered.

Best regards Stefan

-- Stefan Marr Software Languages Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://soft.vub.ac.be/~smarr Phone: +32 2 629 2974 Fax: +32 2 629 3525

David T. Lewis

2:14 p.m.

I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.

Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:

testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

Before Nicolas' changes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544

0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'

Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657

After applying the fixes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577

0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'

Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679

Dave

On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:

...

See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

Nicolas

2012/8/30 David T. Lewis lewis@mail.msen.com:

...
This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:

...
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

Levente Uzonyi

2:44 p.m.

On Sat, 1 Sep 2012, David T. Lewis wrote:

...

I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.

That sounds great, thanks Dave.

...

Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:

testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.

Levente

...

Before Nicolas' changes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544

0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'

Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657

After applying the fixes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577

0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'

Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679

Dave

On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

Nicolas

2012/8/30 David T. Lewis lewis@mail.msen.com:

...
This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:

...
As posted on squeak-dev http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

Igor Stasenko

3:41 p.m.

i wonder, if any changes are needed for jit, because i suppose some of semantics for bigints is jitted.

On 1 September 2012 16:44, Levente Uzonyi leves@elte.hu wrote:

...

On Sat, 1 Sep 2012, David T. Lewis wrote:

...
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.

That sounds great, thanks Dave.

...
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:

testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.

Levente

...
Before Nicolas' changes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544

0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'

Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657

After applying the fixes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577

0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'

Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679

Dave

On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

Nicolas

2012/8/30 David T. Lewis lewis@mail.msen.com:

...
This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:

...
As posted on squeak-dev

http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example

http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read

http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like

https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

-- Best regards, Igor Stasenko.

David T. Lewis

3:52 p.m.

I'm not sure, but I would not expect any differences for jit. The changes here are in the primitives themselves, so I would not expect any impact on jitted methods.

It would be good to take a similar measurement with Cog of course. If there is any real difference in the primitive performance due to these changes, it should be easier to measure with Cog because relatively less time would be spent in the method execution as opposed to the primitives themselves.

Dave

On Sat, Sep 01, 2012 at 05:41:44PM +0200, Igor Stasenko wrote:

...

i wonder, if any changes are needed for jit, because i suppose some of semantics for bigints is jitted.

On 1 September 2012 16:44, Levente Uzonyi leves@elte.hu wrote:

...
On Sat, 1 Sep 2012, David T. Lewis wrote:

...
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.

That sounds great, thanks Dave.

...
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:

testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.

Levente

...
Before Nicolas' changes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544

0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'

Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657

After applying the fixes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577

0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'

Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679

Dave

On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

Nicolas

2012/8/30 David T. Lewis lewis@mail.msen.com:

...
This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote:

...
As posted on squeak-dev

http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... I found 3 bugs in LargeInteger primitives

(1<<63) negated quo: -1. (1<<63) negated / -1. (1<<63) negated * -1.

They are all related to the impossible task of taking absolute value of INT_MIN (or more exactly it's 64 bits equivalent). Currently, it takes the form (0 - INT_MIN) whose behaviour is undefined according to C standards but generally answer INT_MIN. See for example

http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000

Surprisingly this one works: (1<<63) negated // -1.

Most probably because gcc has a license to ignore undefined behaviour and perform some optimizations that don't take overflow side effects into account. For example 0 - (0 - a)/b can be simplified into a/b, UB case of INT_MIN apart...

Beside these bugs, when I read the code, I'm quite sure it's a nest of future bugs because there are many other attempts to catch overflow in post-condition (like testing that addition of two positive is negative when an underflow occurs) that technically rely on explicitely Undefined Behaviour (UB). OK, by now many Arithmetic Units do behave like exploited in these post-conditions, though it's not strictly future-proof. But we unfortunately rely on optimizing C compilers, and its behaviour is much more fragile than hardware...

I invite every VM hacker to read

http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... And various links like

https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure...

For example, in large integer subtract, we have a protection against (0 - INT_MIN) like: y = 0 - x; if ( y==x ) { primitiveFail(); } an optimizing compiler having a licence to ignore INT_MIN Undefined Behaviour case could mathematically solve the equation as x==0, y==0 and transform code into if( ! (y=0-x) ) { primitiveFail(); } (directly use a jz and save a comparison)

or if we have such branch c = a + b; if( a >0) { if(b > 0) { if (c < 0 ) { primitiveFail(); } } } Again, a good compiler could remove the if( c < 0) test, since it does not have to care about UB...

OK, pragmatically, most of these post-condition hacks are fast and work with some version of gcc, but think about portability (llvm ?) and future pain (you can debug such code only at asembler level).

Do it right > do it fast.

Nicolas

-- Best regards, Igor Stasenko.

Igor Stasenko

4:36 p.m.

i am busy with other stuff right now, but i will do a pass on this issue later.

On 1 September 2012 17:52, David T. Lewis lewis@mail.msen.com wrote:

...

I'm not sure, but I would not expect any differences for jit. The changes here are in the primitives themselves, so I would not expect any impact on jitted methods.

It would be good to take a similar measurement with Cog of course. If there is any real difference in the primitive performance due to these changes, it should be easier to measure with Cog because relatively less time would be spent in the method execution as opposed to the primitives themselves.

Dave

On Sat, Sep 01, 2012 at 05:41:44PM +0200, Igor Stasenko wrote:

...
i wonder, if any changes are needed for jit, because i suppose some of semantics for bigints is jitted.

On 1 September 2012 16:44, Levente Uzonyi leves@elte.hu wrote:

...
On Sat, 1 Sep 2012, David T. Lewis wrote:

...
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.

That sounds great, thanks Dave.

...
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:

testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

What's the reason for using #to: and #do: instead of #to:do:? The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.

Levente

...
Before Nicolas' changes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1565 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1550 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1544

0 tinyBenchmarks ==> '437981180 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14141708 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14120068 sends/sec' 0 tinyBenchmarks ==> '440240756 bytecodes/sec; 14163414 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14823236 sends/sec'

Time millisecondsToRun: testBlock ==> 4698 Time millisecondsToRun: testBlock ==> 4671 Time millisecondsToRun: testBlock ==> 4668 Time millisecondsToRun: testBlock ==> 4612 Time millisecondsToRun: testBlock ==> 4657

After applying the fixes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1616 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1581 Time millisecondsToRun: [(1 to: 1000000) do: [:e | e + -9223372036854775808 // -1]] ==> 1577

0 tinyBenchmarks ==> '456735057 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '461677186 bytecodes/sec; 14023502 sends/sec' 0 tinyBenchmarks ==> '462929475 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '459192825 bytecodes/sec; 14002222 sends/sec' 0 tinyBenchmarks ==> '458781362 bytecodes/sec; 13970423 sends/sec'

Time millisecondsToRun: testBlock ==> 4680 Time millisecondsToRun: testBlock ==> 4672 Time millisecondsToRun: testBlock ==> 4722 Time millisecondsToRun: testBlock ==> 4675 Time millisecondsToRun: testBlock ==> 4679

Dave

On Thu, Aug 30, 2012 at 01:14:54AM +0200, Nicolas Cellier wrote:

...
See also http://code.google.com/p/cog/issues/detail?id=92 where I attached a fix for large int It's untested yet and to review carefully !

As Stefan told, there is UB-reliance in SmallInteger primitives too, but I did not fix them. We should simply compute result as signed 64 bits as proposed by Stefan (except bitShift)

Nicolas

2012/8/30 David T. Lewis lewis@mail.msen.com:

...
This is on Mantis at http://bugs.squeak.org/view.php?id=7705

Note last comment in related issue 6987.

This issue will crash the VM when compiled for 64-bit platforms.

Dave

On Wed, Aug 29, 2012 at 12:18:28PM +0200, Nicolas Cellier wrote: > > > As posted on squeak-dev > > http://lists.squeakfoundation.org/pipermail/squeak-dev/2012-August/165608.ht... > I found 3 bugs in LargeInteger primitives > > (1<<63) negated quo: -1. > (1<<63) negated / -1. > (1<<63) negated * -1. > > They are all related to the impossible task of taking absolute value > of INT_MIN (or more exactly it's 64 bits equivalent). > Currently, it takes the form (0 - INT_MIN) whose behaviour is > undefined according to C standards but generally answer INT_MIN. > See for example > > http://stackoverflow.com/questions/2539178/why-is-abs0x80000000-0x80000000 > > Surprisingly this one works: > (1<<63) negated // -1. > > Most probably because gcc has a license to ignore undefined behaviour > and perform some optimizations that don't take overflow side effects > into account. > For example 0 - (0 - a)/b can be simplified into a/b, UB case of > INT_MIN apart... > > --------------------------------- > > Beside these bugs, when I read the code, I'm quite sure it's a nest of > future bugs because there are many other attempts to catch overflow in > post-condition (like testing that addition of two positive is negative > when an underflow occurs) that technically rely on explicitely > Undefined Behaviour (UB). > OK, by now many Arithmetic Units do behave like exploited in these > post-conditions, though it's not strictly future-proof. > But we unfortunately rely on optimizing C compilers, and its behaviour > is much more fragile than hardware... > > I invite every VM hacker to read > > http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflo... > And various links like > > https://www.securecoding.cert.org/confluence/display/seccode/INT32-C.+Ensure... > > For example, in large integer subtract, we have a protection against > (0 - INT_MIN) like: > y = 0 - x; > if ( y==x ) { primitiveFail(); } > an optimizing compiler having a licence to ignore INT_MIN Undefined > Behaviour case could mathematically solve the equation as x==0, y==0 > and transform code into > if( ! (y=0-x) ) { primitiveFail(); } > (directly use a jz and save a comparison) > > or if we have such branch > c = a + b; > if( a >0) { > if(b > 0) { > if (c < 0 ) { primitiveFail(); } > } > } > Again, a good compiler could remove the if( c < 0) test, since it does > not have to care about UB... > > OK, pragmatically, most of these post-condition hacks are fast and > work with some version of gcc, but think about portability (llvm ?) > and future pain (you can debug such code only at asembler level). > > Do it right > do it fast. > > Nicolas

-- Best regards, Igor Stasenko.

-- Best regards, Igor Stasenko.

David T. Lewis

3:46 p.m.

On Sat, Sep 01, 2012 at 04:44:21PM +0200, Levente Uzonyi wrote:

...

On Sat, 1 Sep 2012, David T. Lewis wrote:

...
I have integrated these fixes into trunk VMMaker for testing. The new #testMinimumNegativeIntegerArithmetic unit test passes, and a VM compiled in 64-bit mode no longer crashed. I can find no significant change in performance with these changes applied. This looks really good to me, and if there are no other comments or issues I will go ahead and post the update to trunk VMMaker.

That sounds great, thanks Dave.

...
Here are the performance measurements that I did to compare VM performance before and after applying the changes. This is done with an interpreter VM compiled in 64 bit mode on Linux:

testBlock := [(1 to: 1000000) do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

What's the reason for using #to: and #do: instead of #to:do:?

Ignorance on my part.

...

The latter seems to be much more appropriate in such tests, because it doesn't create any objects and message sends, so garbage collection has less effect on the measurements.

Repeating the measurements using #to:do: below. After applying Nicolas' fixes, there does appear to be a slight improvement in large integer performance as measured by the testBlock, and a slight decrease in bytecodes/sec.

I cannot explain either of these performance variations. I don't see any reason that the changes to the primitives should affect bytecode performance, but I'm attaching a diff of the generated interp.c files before and after the changes in case anyone wants to have a look.

Test results are:

testBlock := [1 to: 1000000 do: [:e | | largeNegativeInt | largeNegativeInt := -9223372036854775808 + e. largeNegativeInt >> 3. largeNegativeInt + 1. largeNegativeInt - -1. largeNegativeInt // -1. largeNegativeInt \ -1. largeNegativeInt rem: -1. largeNegativeInt quo: -1. largeNegativeInt * -1. largeNegativeInt / -1]].

Before Nicolas' changes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 0 passes, 0 expected failures, 1 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: testBlock ==> 4597 Time millisecondsToRun: testBlock ==> 4592 Time millisecondsToRun: testBlock ==> 4602 Time millisecondsToRun: testBlock ==> 4605 Time millisecondsToRun: testBlock ==> 4627

0 tinyBenchmarks ==> '424192212 bytecodes/sec; 14451784 sends/sec' 0 tinyBenchmarks ==> '424543946 bytecodes/sec; 15114601 sends/sec' 0 tinyBenchmarks ==> '421746293 bytecodes/sec; 14451784 sends/sec' 0 tinyBenchmarks ==> '423490488 bytecodes/sec; 15114601 sends/sec' 0 tinyBenchmarks ==> '423490488 bytecodes/sec; 14463110 sends/sec'

After, with fixes:

(LargeNegativeIntegerTest selector: #testMinimumNegativeIntegerArithmetic) run ==> 1 run, 1 passes, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Time millisecondsToRun: testBlock ==> 4416 Time millisecondsToRun: testBlock ==> 4432 Time millisecondsToRun: testBlock ==> 4410 Time millisecondsToRun: testBlock ==> 4400 Time millisecondsToRun: testBlock ==> 4440

0 tinyBenchmarks ==> '438731790 bytecodes/sec; 15028443 sends/sec' 0 tinyBenchmarks ==> '442906574 bytecodes/sec; 14775763 sends/sec' 0 tinyBenchmarks ==> '440619621 bytecodes/sec; 14752141 sends/sec' 0 tinyBenchmarks ==> '436115843 bytecodes/sec; 14775763 sends/sec' 0 tinyBenchmarks ==> '442141623 bytecodes/sec; 14763943 sends/sec'

Dave

Levente Uzonyi

7:44 p.m.

On Sat, 1 Sep 2012, David T. Lewis wrote:

Can't quote your mail, sorry. But according to your numbers, there's improvement in both the testBlock and the sends/sec.

Levente

David T. Lewis

8:51 p.m.

On Sat, Sep 01, 2012 at 09:44:29PM +0200, Levente Uzonyi wrote:

...

On Sat, 1 Sep 2012, David T. Lewis wrote:

Can't quote your mail, sorry. But according to your numbers, there's improvement in both the testBlock and the sends/sec.

D'oh! Brain fart. So yes, it's good news all around.

I still cannot explain *why* the performance is different (better), but I'm happy to give credit to Nicolas for the improvements regardless :)

Dave

4257

Age (days ago)

4260

Last active (days ago)

vm-dev@lists.squeakfoundation.org

26 comments

5 participants

tags (0)

participants (5)

David T. Lewis
Igor Stasenko
Levente Uzonyi
Nicolas Cellier
Stefan Marr