[Vm-dev] 3 Bugs in LargeInteger primitives

Wed Aug 29 21:09:21 UTC 2012

Here is an exemple of what I propose:

primitiveAddLargeIntegers
	"Primitive arithmetic operations for large integers in 64 bit range"
	| a b result oopResult aIsNegative bIsNegative resultIsNegative
oopArg oopRcvr |
	<export: true>
	<var: 'a' type: 'usqLong'>
	<var: 'b' type: 'usqLong'>
	<var: 'result' type: 'usqLong'>
	<var: 'oopArg' type: 'oop'>
	<var: 'oopRcvr' type: 'oop'>
	<var: 'oopResult' type: 'oop'>
	<var: 'aIsNegative' type: 'usqInt'>
	<var: 'bIsNegative' type: 'usqInt'>
	<var: 'resultIsNegative' type: 'usqInt'>

	oopArg := self stackValue: 0.
	oopRcvr := self stackValue: 1.
	aIsNegative := self isNegativeIntegerValueOf: oopRcvr.
	bIsNegative := self isNegativeIntegerValueOf: oopArg.
	a := self magnitude64BitValueOf: oopRcvr.
	b := self magnitude64BitValueOf: oopArg.
	successFlag ifFalse:[^nil].
	(aIsNegative == bIsNegative)
		ifTrue:
			["Protect against overflow"
			a > (16rFFFFFFFFFFFFFFFF - b) ifTrue: [self primitiveFail. ^nil].
			result := a + b.
			resultIsNegative := aIsNegative]
		ifFalse:
			[(a >= b)
				ifTrue:
					[result := a - b.
					resultIsNegative := aIsNegative]
				ifFalse:
					[result := b - a.
					resultIsNegative := bIsNegative]].
	oopResult := self magnitude64BitIntegerFor: result neg: resultIsNegative.
	successFlag ifTrue:[self pop: 2 thenPush: oopResult].

IMHO, code is not much worse than previous one:

primitiveAddLargeIntegers
	"Primitive arithmetic operations for large integers in 64 bit range"
	| integerRcvr integerArg result oopResult |
	<export: true>
	<var: 'integerRcvr' type: 'sqLong'>
	<var: 'integerArg' type: 'sqLong'>
	<var: 'result' type: 'sqLong'>

	integerArg := self signed64BitValueOf: (self stackValue: 0).
	integerRcvr := self signed64BitValueOf: (self stackValue: 1).
	successFlag ifFalse:[^nil].

	"Compute the preliminary result (which may overflow)"
	result := integerRcvr + integerArg.

	"Now check overflow conditions. First is whether rcvr and arg are of
the same sign.
	If they are we need to check for overflow more carefully."
	(integerRcvr bitXor: integerArg) < 0 ifFalse:[
		"Second is whether rcvr and result are of the same sign. If not, we
have an overflow."
		(integerRcvr bitXor: result) < 0 ifTrue:[self primitiveFail]].

	successFlag ifFalse:[^nil].

	oopResult := self signed64BitIntegerFor: result.
	successFlag ifTrue:[self pop: 2 thenPush: oopResult].

And conversion with sign/magnitude is simpler...

isNegativeIntegerValueOf: oop
	"Answer 1 if integer object is negative, 0 otherwise.
	Fail if object pointed by oop i not an integer."
	| largeClass smallInt |
	<returnTypeC: #usqInt>
	<var: #smallInt type: #sqInt>
	(self isIntegerObject: oop)
		ifTrue:
			[smallInt := self integerValueOf: oop.
			^smallInt < 0].
	largeClass := self fetchClassOfNonInt: oop.
	largeClass = self classLargePositiveInteger ifTrue: [^0].
	largeClass = self classLargePositiveInteger ifTrue: [^0].
	^self primitiveFail

magnitude64BitValueOf: oop
	"Convert the given object into an integer value.
	The object may be either a ST integer or a eight-byte LargeInteger."
	| sz value largeClass szsqLong smallIntValue |
	<inline: false>
	<returnTypeC: #usqLong>
	<var: #value type: #usqLong>
	<var: #smallIntValue type: #sqInt>
	(self isIntegerObject: oop)
		ifTrue:
			[smallIntValue := self integerValueOf: oop.
			smallIntValue < 0 ifTrue: [smallIntValue := 0 - smallIntValue].
			^self cCoerce: smallIntValue to: #usqLong].
	largeClass := self fetchClassOfNonInt: oop.
	largeClass = self classLargePositiveInteger
		ifFalse:[largeClass = self classLargeNegativeInteger
				ifFalse:[^self primitiveFail]].
	szsqLong := self sizeof: #usqLong asSymbol.
	sz := self lengthOf: oop.
	sz > szsqLong
		ifTrue: [^ self primitiveFail].
	value := 0.
	0 to: sz - 1 do: [:i |
		value := value + ((self cCoerce: (self fetchByte: i ofObject: oop)
to: #usqLong) <<  (i*8))].
	^value

magnitude64BitIntegerFor: integerValue neg: isNegative
	"Return a Large Integer object for the given integer magnitude and sign"
	| newLargeInteger largeClass intValue highWord sz isSmall smallVal |
	<inline: false>
	<var: 'integerValue' type: 'usqLong'>
	<var: 'isNegative' type: 'usqInt'>
	<var: 'highWord' type: 'usqInt'>
	<var: 'isSmall' type: 'usqInt'>
	<var: 'smallVal' type: 'sqInt'>

	isSmall := isNegative
		ifTrue: [integerValue <= 16r40000000]
		ifFalse: [integerValue < 16r40000000].
	isSmall
		ifTrue:
			[smallVal := self cCoerce: integerValue to: #sqInt.
			isNegative	ifTrue: [smallVal := 0 - smallVal].
			^self integerObjectOf: smallVal].
	isNegative < 0
		ifTrue:[	largeClass := self classLargeNegativeInteger]
		ifFalse:[	largeClass := self classLargePositiveInteger].
	highWord := self cCode: 'magnitude >> 32'. "shift is coerced to
usqInt otherwise"
	highWord = 0
		ifTrue: [sz := 4]
		ifFalse:[
			sz := 5.
			(highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1].
			(highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1].
			(highWord := highWord >> 8) = 0 ifFalse:[sz := sz + 1].
		].
	newLargeInteger := self instantiateClass: largeClass indexableSize:  sz.
	0 to: sz-1 do: [:i |
		intValue := self cCode: '(magnitude >> (i * 8)) & 255'.
		self storeByte: i ofObject: newLargeInteger withValue: intValue].
	^ newLargeInteger

Nicolas

2012/8/29 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> I checked, unsigned int overflow behaviour is a well defined standard
> in both C/C++ and result in a modulo 2^ (sizeof( uint_type )
> *CHAR_BIT)
> So I'm now convinced that sign/magnitude decomposition is the way to go.
> Anyway, you can currently observe sign dissertation to check overflow
> in post condition, so we already pay the same price as sign/magnitude
> solution, except that code is currently relying on broken C signed
> arithmetic model.
>
> I may post corrected primitives for basic arithmetic ops when I have time,..
> But I won't have any frustration if a true VM hacker or someone more
> available than me could do it, I don't even know our own little name
> for an unsigned int 64, usqLong ?
>
> Nicolas
>
> 2012/8/29 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>> For the (1<<63) negated bug itself, one very simple solution would be
>> to just refuse it as a valid int64...
>> We spent many effort to handle it in #signed64BitValueOf: , and our
>> rewards are many bug popping out in the primitives where it is used.
>> Regarding efficiency, we will have to protect code with
>> inefficient-UB-broken defensive if or very-inefficient-portable-C, so
>> the best choice is to just filter it out right at the beginning...
>> I also wonder if handling a sign-magnitude wouldn't just be easier in
>> that case (except maybe for + and -).
>>
>> Of course the other UB are remaining, but one thing at a time.
>>
>> Nicolas
>>
>> 2012/8/29 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>> Originally, C was very close to machine instructions and was conceived
>>> as a generic assembler.
>>> But it's more and more distant and becoming abstract.
>>>
>>> Unfortunately, the abstract arithmetic model is completely broken with
>>> UB: it's un-reliable.
>>> So what does this kind of abstraction serves?
>>> Really, it makes me wonder...
>>>
>>> If portable C code becomes both long and inefficient, I guess that
>>> cost of maintaining assembler for some part that we know are broken is
>>> an option indeed.
>>>
>>> I also see that Andreas had a hard time with slang, some intermediate
>>> operations being cast to uint32 instead of int64, he finally had to
>>> use many cCode: hacks... Going both thru slang and C intermediate
>>> sounds like too much work and too few safety for implementation basic
>>> arithmetic (obviously we didn't and we can't easily model broken C
>>> behaviour in Slang, it's too complex !).
>>>
>>> Nicolas
>>>
>>> 2012/8/29 Stefan Marr <smalltalk at stefan-marr.de>:
>>>>
>>>> Hi Nicolas:
>>>>
>>>> On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
>>>>
>>>>>
>>>>> Beside these bugs, when I read the code, I'm quite sure it's a nest of
>>>>> future bugs because there are many other attempts to catch overflow in
>>>>> post-condition (like testing that addition of two positive is negative
>>>>> when an underflow occurs) that technically rely on explicitely
>>>>> Undefined Behaviour (UB).
>>>>
>>>> I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html
>>>> is related too.
>>>> I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
>>>>
>>>> Since C is undefined in that regard, what are the options?
>>>> Hand-crafted assembly for all relevant platforms?
>>>> Are there libraries that abstract from these things?
>>>>
>>>> I think Clang has a compiler switch to warn at compile-time, or trigger a runtime warning/error for these issues with undefined behavior. That might help for a thorough sweep through the code.
>>>>
>>>> Best regards
>>>> Stefan
>>>>
>>>>
>>>> --
>>>> Stefan Marr
>>>> Software Languages Lab
>>>> Vrije Universiteit Brussel
>>>> Pleinlaan 2 / B-1050 Brussels / Belgium
>>>> http://soft.vub.ac.be/~smarr
>>>> Phone: +32 2 629 2974
>>>> Fax:   +32 2 629 3525
>>>>