[Vm-dev] Issues with the clock on V3

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Sat Sep 3 09:02:45 UTC 2016


I'm going to show how I try to solve such problem.
It's going to be a bit long, sorry, but there might be something didactic
in it since you ask for a clue...

Starting with Integer Sunit test, I detect that one example of strange
behavior is:
    (1 << 63) negated + 1.
It answers a smallInteger (1).
It uses primitive 21, that is primitiveAddLargeIntegers

This one does not print correctly, but produces a correct result:
    1 + (1 << 63) negated.
It uses primitiveDigitAdd from LargeIntegers, so at least, we have one path
working...

primitiveAddLargeIntegers tries using native 64bits arithmetic before
engaging LargeIntegers module digit-based arithmetic.
It has already been modified long time ago to avoid undefined behavior
related to checking overflow in post condition, so the slang code looks
correct at upper level. Anyway, since both integers in our failing case
have different sign we should not pass thru the branch checking for
overflow.
It might be a problem of UB with used methods or with inlining and type
inference.

So one has to focus on generated C code of primitiveAddLargeIntegers.
it's not a problem with inlining, because no method are inlined here, and
the C code looks equally good to slang.

So it remains to check isNegativeIntegerValueOf magnitude64BitValueOf and
magnitude64BitIntegerForneg.
There is some dead code generated for checking class of oop, for example:
                ok = ClassLargeNegativeIntegerCompactIndex == ccIndex;
                goto l1;

                ok = classOop == (fetchPointerofObject(ccIndex - 1,
longAt((GIV(specialObjectsOop) + BaseHeaderSize) +
(((int)((usqInt)(CompactClasses) << (shiftForWord())))))));
        l1:     /* end isClassOfNonImm:equalTo:compactClassIndex: */;
The line after goto l1 is unreacheable, but that should not be the problem,
it' juts because we exploited the compact class index.

One line is worth checking in magnitude64BitValueOf:
    value = SQ_SWAP_8_BYTES_IF_BIGENDIAN((fetchLong64ofObject(0, oop)));

Then inspecting code of fetchLong64ofObject we see an awfull thing:
    return
        (((unsigned int) (long32At((oop + BaseHeaderSize) +
(((sqInt)((usqInt)(longIndex) << 3))))))) + (((sqInt)((usqInt)((((unsigned
long long)(long32At((oop + BaseHeaderSize) + ((((sqInt)((usqInt)(longIndex)
<< 3))) + 4)))))) << 32)))

That's the problem. We are converting to unsigned long long (64) before
shifting by 32 bit, that's correct, but just before the shift, we convert
back to usqInt (32 bits long) and thus get completely undefined behaviour!

Let's see the warnings in generated LOGF compilation report:
../../stacksrc/vm/gcc3x-interp.c:27429:226: warning: shift count >= width
of type [-Wshift-count-overflow]           (((unsigned int) (long32At((oop
+ BaseHeaderSize) + (((sqInt)((usqInt)(longIndex) << 3))))))) +
(((sqInt)((usqInt)((((unsigned long long)(long32At((oop + BaseHeaderSize) +
((((sqInt)((usqInt)(longIndex) << 3))) + 4)))))) << 32)))

Problem confirmed (Phew, I've deciphered correctly the parentheses and type
cast precedence).
Let's see the slang code of fetchLong64: longIndex ofObject: oop
    <returnTypeC: #sqLong>
    ^self cppIf: BytesPerWord = 8
        ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex <<
3)]
        ifFalse:
            ["BEWARE OF SIGN EXTENSION ON LEAST SIGNIFICAND LIMB
            'unsigned int) CAST IS THERE TO PREVENT SUCH ERROR"
            self cppIf: VMBIGENDIAN
                ifTrue: [((self long32At: oop + self baseHeaderSize +
(longIndex << 3)) asUnsignedLongLong << 32)
                    + (self cCoerceSimple: (self long32At: oop + self
baseHeaderSize + (longIndex << 3 + 4)) to: #'unsigned int')]
                ifFalse: [(self cCoerceSimple: (self long32At: oop + self
baseHeaderSize + (longIndex << 3)) to: #'unsigned int')
                    + ((self long32At: oop + self baseHeaderSize +
(longIndex << 3 + 4)) asUnsignedLongLong << 32)]]

Till there, that sounds correct. So there's a problem in generating the
shift << 32 with undue usqInt conversion.
This generation takes place in CCodeGenerator generateShiftLeft: msgNode
on: aStream indent: level.
I've added a protection to avoid undefined behavior: I first convert to
unsigned before shifting.
But the type is elready unsigned long long, so that would mean that it's
the type inference of asUnsignedLongLong that failed.

asUnsignedLongLong is generated directly via
CCodeGenerator>>generateAsUnsignedLongLong:on:indent: (see
initializeCTranslationDictionary)
This mechanism does bypass type inference rules, so there should be a
corresponding type inference rule in
CCodeGenerator>>returnTypeForSend:in:ifNil:
ALAS, there is none!!!

So my change of << generation for avoiding one problem with UB (a real
problem: that prevents compiling win64 version correctly) did break another
part.

This shows how brittle the C Code generation is.
Now that we use type inference rules, EVERY selector generated thru
initializeCTranslationDictionary should have a type rule counterpart in
returnTypeForSend:in:ifNil:

The problem does not appear in SPUR, because memory is eight byte aligned,
so we do not have to decompose the fetch into 2 32-bits words.
I carried this analysis in LLP64_v2 branch, but I think it's the same in
cog.
I'll commit a patch.

Thanks for reporting!



2016-09-03 9:57 GMT+02:00 Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com>:

> Interestingly, when I recompile on MacOSX, the bug happens on fast version
> of squeak.stack.v3 but not on debug version!!!
> That means that it's vain to track the problem with VM simulator.
> It's also vain to try debugging at C level...
>
> What remains is:
> Use lldb the unstripped fast version, put a break point in some arithmetic
> primitive then stepi into assembly.
> Or carefully inspect the compiler warnings (with ./mvm -- WARNINGS='-Wall
> -Wextra') and carefully read generated C code to track for a possible
> undefined behavior striking.
> Or compare generated C code with generated C assembly to understand what's
> going on.
> Or instrument the generated code with undefined behavior guards (I don't
> remember the clang incantation for that, but it's possible).
>
> 2016-09-02 22:09 GMT+02:00 Nicolas Cellier <nicolas.cellier.aka.nice@
> gmail.com>:
>
>> Hi Laura,
>> it sounds like some LargeIntegers arithmetic is broken in v3...
>> The clock symptoms is just a side effect.
>>
>> 2016-09-02 21:23 GMT+02:00 Laura Perez Cerrato <
>> lauraperezcerrato at gmail.com>:
>>
>>>
>>> Hello everyone,
>>>
>>> Lately, while working on the VM, I noticed that a few primitives related
>>> with the clock on V3 were returning incorrect values. Cloning the
>>> repository and building a Squeak Cog V3 VM produces a VM in which
>>> DateAndTime class>>#now answers a value closer than a day to the epoch in
>>> both Squeak 4.5 and the lastest version of Cuis. In the lastest version of
>>> Cuis, Time class>>#primLocalMicrosecondClock, Time
>>> class>>#primLocalSecondClock and Time class>>#primHighResClock return odd
>>> values. This problem is seen exclusively on V3 VMs as far as I can tell,
>>> independently of the interpreter.
>>>
>>> I wasn't capable of pinning down the reason for this strange behaviour.
>>> If anyone could give me a hand with that, I'd appreaciate it.
>>>
>>> Cheers!
>>>
>>> -Laura Perez Cerrato
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160903/df21ef71/attachment.htm


More information about the Vm-dev mailing list