[Vm-dev] Re: vm problem on cog an stack v3

David T. Lewis lewis at mail.msen.com
Tue Mar 29 23:19:35 UTC 2016


On Tue, Mar 29, 2016 at 11:36:57PM +0200, Nicolas Cellier wrote:
>  
> 2016-03-27 15:39 GMT+02:00 Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
> 
> > To be more specific,
> >
> > The first thing to do would be to confirm that the KernelNumber tests fail
> > with a squeak.stack.v3 VM compiled with head revision of COG VM branch,
> > whatever OS.
> >
> > Then knowing from which SVN version of COG VM branch the KernelNumber the
> > tests start failing would be nice.
> >
> > The bisect job is to:
> > - iterate on version number (whatever strategy, bsearch or something)
> > - checkout VM sources
> > - compile the build.favouriteOS/squeak.stack.v3
> > - run a v3 image with the generated VM and launch KernelNumber tests
> >
> > Really a job for a jenkins bot, travis bot or whatever...
> > The next good thing would be to give a little love to build.squeak.org or
> > anyother similar solution.
> > I only see red disks on this site...
> >
> >
> Follow up: I did bissect manually:
> 3648 (VMMaker.oscog-eem.nice.1729) OK
> 3649 (VMMaker.oscog-eem.1740) Not OK
> 
> So something went wrong for V3 between these two versions.
> At the same time, it works for spur.
> 
> Spur objects are 8 bytes aligned, v3 objects are 4 bytes aligned...
> So fetching 64 bits from V3 MUST be decomposed into 2 32 bits fetch to
> avoid a segfault related to misalign fetch.
> 
> OK, let's look how it is performed:
> 
> fetchLong64: longIndex ofObject: oop
>     <returnTypeC: #sqLong>
>     ^self cppIf: BytesPerWord = 8
>         ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex <<
> 3)]
>         ifFalse:
>             [self cppIf: VMBIGENDIAN
>                 ifTrue: [((self long32At: oop + self baseHeaderSize +
> (longIndex << 3)) asUnsignedLongLong << 32)
>                     + (self long32At: oop + self baseHeaderSize +
> (longIndex << 3 + 4))]
>                 ifFalse: [(self long32At: oop + self baseHeaderSize +
> (longIndex << 3))
>                     + ((self long32At: oop + self baseHeaderSize +
> (longIndex << 3 + 4)) asUnsignedLongLong << 32)]]
> 
> AH AH! Operation is:
>     low + (((unsigned) high) << 32)
> 
> With low declared as SIGNED (long32At is signed GRRR).
> What if bit 32 of low is 1? then the compiler will perform:
> 
> (unsigned long) low + ...
> 
> AND THIS WILL DO A SIGN EXTENSION...
> 
> It's not that my code was false...
> It's just that it uncovered a bug in fetchLong64:ofObject:

Similar issues must exist in other places. The trunk VMM (aka interpreter)
that I tested has no #fetchLong64:ofObject: but it had the same failure
symptoms that you saw with StackInterpreter/Cog/Spur.

> 
> That cost me many hours, but that enforce my conviction about signedness...
> I would much much much prefer to call unsignedLong32At because it's what I
> mean 9 times out of 10: get the magnitude...
> 

+1

Debugging problems like this is hard. Declaring variables properly is easy.

Dave


> 2016-03-27 0:40 GMT+01:00 Nicolas Cellier <
> > nicolas.cellier.aka.nice at gmail.com>:
> >
> >> Hi,
> >> before I continue, i've noticed that the large integer multiply seems
> >> broken on v3 object memory (cog & stack)
> >> Note that this does not happen on Spur.
> >>
> >> This is independent of my recent changes of LargeIntegers plugin as it
> >> happens BEFORE these changes and is related to primitive 29 rather than to
> >> the plugin...
> >> Here are the symptoms:
> >>
> >> halfPower := 10000.
> >> s := 111111111111111.
> >> head := s quo: halfPower.
> >> tail := s - (head * halfPower).
> >> {
> >>    head as: ByteArray.
> >>    (1 to: halfPower digitLength) collect: [:i | halfPower digitAt: i] as:
> >> ByteArray.
> >>    (head*halfPower) as: ByteArray.
> >> }.
> >>
> >> the correct result is:
> >>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 14 101])
> >>
> >> the wrong result I obtained with SVN revision 3651 compiled by myself is:
> >>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 254 61])
> >>
> >> The most significant bits (above 32) are wrong...
> >> The pattern I obtain is (with most significant bit put back left)
> >>
> >> 2r00111101 << 8 +  2r11111110   "wrong result"
> >> 2r01100101 << 8 + 2r00001110  "Correct result"
> >>
> >> I completely fail to infer what's going on from this pattern...
> >>
> >> This is on MacOSX clang --version
> >> Apple LLVM version 7.3.0 (clang-703.0.29)
> >> Target: x86_64-apple-darwin15.4.0
> >>
> >> This goes thru primitiveMultiplyLargeIntegers (29)
> >> oopResult := self magnitude64BitIntegerFor: result neg: aIsNegative ~=
> >> bIsNegative.
> >> -> sz > 4
> >>                 ifTrue: [objectMemory storeLong64: 0 ofObject:
> >> newLargeInteger withValue: magnitude]
> >> (which I changed recently)
> >>
> >> then:
> >> storeLong64: longIndex ofObject: oop withValue: value
> >>     <var: #value type: #sqLong>
> >>     self flag: #endianness.
> >>     self long32At: oop + self baseHeaderSize + (longIndex << 3) put:
> >> (self cCode: [value] inSmalltalk: [value bitAnd: 16rFFFFFFFF]);
> >>         long32At: oop + self baseHeaderSize + (longIndex << 3) + 4 put:
> >> (value >> 32).
> >>     ^value
> >>
> >> I don't see anything wrong with this code...
> >> Well, using a shift on signed value is not that good, but it works for at
> >> least 3 reasons:
> >> - we throw the signBit extension away
> >> - slang inlining misses the signedness difference, and the generated C
> >> code is correct.
> >> - Anyway, in our case, the sign bit was 0...
> >>
> >> Previous implementation in magnitude64BitIntegerFor:neg: was:
> >> sz > 4 ifTrue:
> >>                 [objectMemory storeLong32: 1 ofObject: newLargeInteger
> >> withValue: magnitude >> 32].
> >>             objectMemory
> >>                 storeLong32: 0
> >>                 ofObject: newLargeInteger
> >>                 withValue: (self cCode: [magnitude] inSmalltalk:
> >> [magnitude bitAnd: 16rFFFFFFFF])
> >>
> >> Not much different, except that the high 32 bits and low 32 bits are
> >> written in a different order...
> >>
> >> If I had a server I'd like to bisect
> >> - from which version does this happens
> >> - for which OS
> >> - for which compiler
> >>
> >> Without such information, I think I'll have to debug it either thru
> >> simulator or directly in gdb, but I feel like I'm really losing my time :(
> >>
> >> And I've got a 2nd problem like this one...
> >>
> >>
> >>
> >>
> >>
> >



More information about the Vm-dev mailing list