[Vm-dev] 3 Bugs in LargeInteger primitives

Thu Aug 30 10:31:19 UTC 2012

Oops, there is another one hardcoded constant in bitShift (was already there)

shift >= 64 should better be shift >= sizeof(usqLong)*CHAR_BIT...

Nicolas

2012/8/30 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
> Also note that I introduced one future-flaw by hardcoding a 64 bits constant...
>
> 1) 16rFFFFFFFFFFFFFFFF should be replace with a proper macro
> (USQLONG_MAX or something) in pre-condition tests
>
> I did not correct the other flaws already spreaded in code:
> - SmallInteger maxVal/linVal are hardcoded everywhere
> And we have these implicit requirements:
> - SmallInteger maxVal/minVal has to fit in a sqInt
> - (0 - SmallInteger minVal) has to fit in a sqInt too (currently
> possible, it has only 31 bits)
>
> 2012/8/30 Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>> 2012/8/30 David T. Lewis <lewis at mail.msen.com>:
>>>
>>> On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:
>>>>
>>>> Hi Nicolas:
>>>>
>>>> On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
>>>>
>>>> >
>>>> > Beside these bugs, when I read the code, I'm quite sure it's a nest of
>>>> > future bugs because there are many other attempts to catch overflow in
>>>> > post-condition (like testing that addition of two positive is negative
>>>> > when an underflow occurs) that technically rely on explicitely
>>>> > Undefined Behaviour (UB).
>>>
>>> See below. Tests such as this are essential, and they they do *not* rely on
>>> undefined behavior if the C variables are properly declared.
>>>
>>>>
>>>> I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html
>>>> is related too.
>>>> I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
>>>>
>>>> Since C is undefined in that regard, what are the options?
>>>> Hand-crafted assembly for all relevant platforms?
>>>> Are there libraries that abstract from these things?
>>>
>>> A good general solution is to perform the arithmetic using variables declared
>>> as, or cast to, unsigned. The ambiguity in C language pertains only to signed
>>> twos complement arithmetic, so if the operations are performed on twos complement
>>> values that are declared unsigned, then no compiler optimization is possible and
>>> the results are unambiguous regardless of compiler behavior. Results of the
>>> unsigned operations may be tested for overflow, then cast back to signed integer
>>> if the result is intended to be interpreted as a signed integer.
>>>
>>> Dave
>>>
>>
>> For LargeInt, I prefer proper sign/unsigned magnitude handling to
>> signed arithmetic hacks for several reasons:
>>
>> With signed arithmetic:
>> 1) in conversion to/from LargeInt we have to discuss sign anyway and
>> we have that magnitude fits into INT_MIN to INT_MAX
>> 2) in overflow post-condition test, we have to discuss sign again -
>> for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers
>> 3) we still need defensive protection against (0 - INT_MIN)
>> 4) all casts rely on 2-complement representation which is
>> implementation defined (though universal nowadays)
>>
>> with sign/magnitude:
>> 1) conversion from/to is simplified
>> 2) overflow are easy to test in pre-condition like a > (UINT_MAX - b)
>> for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even
>> with a bogus compiler, and as efficient as broken post-condition tests
>> 3) no protection needed against INT_MIN
>> 4) no reliance on 2-complement at all
>> 5) primitives work 1 bit further
>> 6) code is simpler (less if)
>>
>> The only thing missing in my proposition (see attachment in
>> http://code.google.com/p/cog/issues/detail?id=92) is a function
>> retrieving sign and magnitude in a single function call (it's easy in
>> C, but I don't know slang enough, how to return a struct or pass a
>> pointer is beyond my knowledge).
>>
>> In a word, sign/magnitude is easier/cleaner and not less performant
>> (once single function call is resolved)
>>
>> For SmallInt, computing result in signed 64 bits as proposed by Stefan
>> is the simplest thing we can do IMHO.
>> I'm quite sure it won't ruin efficiency.
>>
>> Nicolas