[Vm-dev] Issues with the clock on V3

David T. Lewis lewis at mail.msen.com
Sat Sep 3 13:58:46 UTC 2016


Thanks for posting your analysis and debugging notes on this, it is good
to be able to see both the problem and how you located it.

Dave


On Sat, Sep 03, 2016 at 11:02:45AM +0200, Nicolas Cellier wrote:
>  
> I'm going to show how I try to solve such problem.
> It's going to be a bit long, sorry, but there might be something didactic
> in it since you ask for a clue...
> 
> Starting with Integer Sunit test, I detect that one example of strange
> behavior is:
>     (1 << 63) negated + 1.
> It answers a smallInteger (1).
> It uses primitive 21, that is primitiveAddLargeIntegers
> 
> This one does not print correctly, but produces a correct result:
>     1 + (1 << 63) negated.
> It uses primitiveDigitAdd from LargeIntegers, so at least, we have one path
> working...
> 
> primitiveAddLargeIntegers tries using native 64bits arithmetic before
> engaging LargeIntegers module digit-based arithmetic.
> It has already been modified long time ago to avoid undefined behavior
> related to checking overflow in post condition, so the slang code looks
> correct at upper level. Anyway, since both integers in our failing case
> have different sign we should not pass thru the branch checking for
> overflow.
> It might be a problem of UB with used methods or with inlining and type
> inference.
> 
> So one has to focus on generated C code of primitiveAddLargeIntegers.
> it's not a problem with inlining, because no method are inlined here, and
> the C code looks equally good to slang.
> 
> So it remains to check isNegativeIntegerValueOf magnitude64BitValueOf and
> magnitude64BitIntegerForneg.
> There is some dead code generated for checking class of oop, for example:
>                 ok = ClassLargeNegativeIntegerCompactIndex == ccIndex;
>                 goto l1;
> 
>                 ok = classOop == (fetchPointerofObject(ccIndex - 1,
> longAt((GIV(specialObjectsOop) + BaseHeaderSize) +
> (((int)((usqInt)(CompactClasses) << (shiftForWord())))))));
>         l1:     /* end isClassOfNonImm:equalTo:compactClassIndex: */;
> The line after goto l1 is unreacheable, but that should not be the problem,
> it' juts because we exploited the compact class index.
> 
> One line is worth checking in magnitude64BitValueOf:
>     value = SQ_SWAP_8_BYTES_IF_BIGENDIAN((fetchLong64ofObject(0, oop)));
> 
> Then inspecting code of fetchLong64ofObject we see an awfull thing:
>     return
>         (((unsigned int) (long32At((oop + BaseHeaderSize) +
> (((sqInt)((usqInt)(longIndex) << 3))))))) + (((sqInt)((usqInt)((((unsigned
> long long)(long32At((oop + BaseHeaderSize) + ((((sqInt)((usqInt)(longIndex)
> << 3))) + 4)))))) << 32)))
> 
> That's the problem. We are converting to unsigned long long (64) before
> shifting by 32 bit, that's correct, but just before the shift, we convert
> back to usqInt (32 bits long) and thus get completely undefined behaviour!
> 
> Let's see the warnings in generated LOGF compilation report:
> ../../stacksrc/vm/gcc3x-interp.c:27429:226: warning: shift count >= width
> of type [-Wshift-count-overflow]           (((unsigned int) (long32At((oop
> + BaseHeaderSize) + (((sqInt)((usqInt)(longIndex) << 3))))))) +
> (((sqInt)((usqInt)((((unsigned long long)(long32At((oop + BaseHeaderSize) +
> ((((sqInt)((usqInt)(longIndex) << 3))) + 4)))))) << 32)))
> 
> Problem confirmed (Phew, I've deciphered correctly the parentheses and type
> cast precedence).
> Let's see the slang code of fetchLong64: longIndex ofObject: oop
>     <returnTypeC: #sqLong>
>     ^self cppIf: BytesPerWord = 8
>         ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex <<
> 3)]
>         ifFalse:
>             ["BEWARE OF SIGN EXTENSION ON LEAST SIGNIFICAND LIMB
>             'unsigned int) CAST IS THERE TO PREVENT SUCH ERROR"
>             self cppIf: VMBIGENDIAN
>                 ifTrue: [((self long32At: oop + self baseHeaderSize +
> (longIndex << 3)) asUnsignedLongLong << 32)
>                     + (self cCoerceSimple: (self long32At: oop + self
> baseHeaderSize + (longIndex << 3 + 4)) to: #'unsigned int')]
>                 ifFalse: [(self cCoerceSimple: (self long32At: oop + self
> baseHeaderSize + (longIndex << 3)) to: #'unsigned int')
>                     + ((self long32At: oop + self baseHeaderSize +
> (longIndex << 3 + 4)) asUnsignedLongLong << 32)]]
> 
> Till there, that sounds correct. So there's a problem in generating the
> shift << 32 with undue usqInt conversion.
> This generation takes place in CCodeGenerator generateShiftLeft: msgNode
> on: aStream indent: level.
> I've added a protection to avoid undefined behavior: I first convert to
> unsigned before shifting.
> But the type is elready unsigned long long, so that would mean that it's
> the type inference of asUnsignedLongLong that failed.
> 
> asUnsignedLongLong is generated directly via
> CCodeGenerator>>generateAsUnsignedLongLong:on:indent: (see
> initializeCTranslationDictionary)
> This mechanism does bypass type inference rules, so there should be a
> corresponding type inference rule in
> CCodeGenerator>>returnTypeForSend:in:ifNil:
> ALAS, there is none!!!
> 
> So my change of << generation for avoiding one problem with UB (a real
> problem: that prevents compiling win64 version correctly) did break another
> part.
> 
> This shows how brittle the C Code generation is.
> Now that we use type inference rules, EVERY selector generated thru
> initializeCTranslationDictionary should have a type rule counterpart in
> returnTypeForSend:in:ifNil:
> 
> The problem does not appear in SPUR, because memory is eight byte aligned,
> so we do not have to decompose the fetch into 2 32-bits words.
> I carried this analysis in LLP64_v2 branch, but I think it's the same in
> cog.
> I'll commit a patch.
> 
> Thanks for reporting!
> 
> 
> 
> 2016-09-03 9:57 GMT+02:00 Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
> 
> > Interestingly, when I recompile on MacOSX, the bug happens on fast version
> > of squeak.stack.v3 but not on debug version!!!
> > That means that it's vain to track the problem with VM simulator.
> > It's also vain to try debugging at C level...
> >
> > What remains is:
> > Use lldb the unstripped fast version, put a break point in some arithmetic
> > primitive then stepi into assembly.
> > Or carefully inspect the compiler warnings (with ./mvm -- WARNINGS='-Wall
> > -Wextra') and carefully read generated C code to track for a possible
> > undefined behavior striking.
> > Or compare generated C code with generated C assembly to understand what's
> > going on.
> > Or instrument the generated code with undefined behavior guards (I don't
> > remember the clang incantation for that, but it's possible).
> >
> > 2016-09-02 22:09 GMT+02:00 Nicolas Cellier <nicolas.cellier.aka.nice@
> > gmail.com>:
> >
> >> Hi Laura,
> >> it sounds like some LargeIntegers arithmetic is broken in v3...
> >> The clock symptoms is just a side effect.
> >>
> >> 2016-09-02 21:23 GMT+02:00 Laura Perez Cerrato <
> >> lauraperezcerrato at gmail.com>:
> >>
> >>>
> >>> Hello everyone,
> >>>
> >>> Lately, while working on the VM, I noticed that a few primitives related
> >>> with the clock on V3 were returning incorrect values. Cloning the
> >>> repository and building a Squeak Cog V3 VM produces a VM in which
> >>> DateAndTime class>>#now answers a value closer than a day to the epoch in
> >>> both Squeak 4.5 and the lastest version of Cuis. In the lastest version of
> >>> Cuis, Time class>>#primLocalMicrosecondClock, Time
> >>> class>>#primLocalSecondClock and Time class>>#primHighResClock return odd
> >>> values. This problem is seen exclusively on V3 VMs as far as I can tell,
> >>> independently of the interpreter.
> >>>
> >>> I wasn't capable of pinning down the reason for this strange behaviour.
> >>> If anyone could give me a hand with that, I'd appreaciate it.
> >>>
> >>> Cheers!
> >>>
> >>> -Laura Perez Cerrato
> >>>
> >>>
> >>
> >



More information about the Vm-dev mailing list