[Vm-dev] 3 Bugs in LargeInteger primitives

Thu Aug 30 08:03:42 UTC 2012

2012/8/30 David T. Lewis <lewis at mail.msen.com>:
>
> On Wed, Aug 29, 2012 at 01:24:39PM +0200, Stefan Marr wrote:
>>
>> Hi Nicolas:
>>
>> On 29 Aug 2012, at 12:18, Nicolas Cellier wrote:
>>
>> >
>> > Beside these bugs, when I read the code, I'm quite sure it's a nest of
>> > future bugs because there are many other attempts to catch overflow in
>> > post-condition (like testing that addition of two positive is negative
>> > when an underflow occurs) that technically rely on explicitely
>> > Undefined Behaviour (UB).
>
> See below. Tests such as this are essential, and they they do *not* rely on
> undefined behavior if the C variables are properly declared.
>
>>
>> I guess http://forum.world.st/Is-bytecodePrimMultiply-correct-td3869580.html
>> is related too.
>> I am not sure whether that got changed in the VMs, but sounds very much like the same kind of problem. (undefined behavior and overflows)
>>
>> Since C is undefined in that regard, what are the options?
>> Hand-crafted assembly for all relevant platforms?
>> Are there libraries that abstract from these things?
>
> A good general solution is to perform the arithmetic using variables declared
> as, or cast to, unsigned. The ambiguity in C language pertains only to signed
> twos complement arithmetic, so if the operations are performed on twos complement
> values that are declared unsigned, then no compiler optimization is possible and
> the results are unambiguous regardless of compiler behavior. Results of the
> unsigned operations may be tested for overflow, then cast back to signed integer
> if the result is intended to be interpreted as a signed integer.
>
> Dave
>

For LargeInt, I prefer proper sign/unsigned magnitude handling to
signed arithmetic hacks for several reasons:

With signed arithmetic:
1) in conversion to/from LargeInt we have to discuss sign anyway and
we have that magnitude fits into INT_MIN to INT_MAX
2) in overflow post-condition test, we have to discuss sign again -
for example, prim(Add/Subtract/Div/Quo/Mod)LargeIntegers
3) we still need defensive protection against (0 - INT_MIN)
4) all casts rely on 2-complement representation which is
implementation defined (though universal nowadays)

with sign/magnitude:
1) conversion from/to is simplified
2) overflow are easy to test in pre-condition like a > (UINT_MAX - b)
for addition, a > (UINT_MAX/b) for *, and this is absolutely safe even
with a bogus compiler, and as efficient as broken post-condition tests
3) no protection needed against INT_MIN
4) no reliance on 2-complement at all
5) primitives work 1 bit further
6) code is simpler (less if)

The only thing missing in my proposition (see attachment in
http://code.google.com/p/cog/issues/detail?id=92) is a function
retrieving sign and magnitude in a single function call (it's easy in
C, but I don't know slang enough, how to return a struct or pass a
pointer is beyond my knowledge).

In a word, sign/magnitude is easier/cleaner and not less performant
(once single function call is resolved)

For SmallInt, computing result in signed 64 bits as proposed by Stefan
is the simplest thing we can do IMHO.
I'm quite sure it won't ruin efficiency.

Nicolas