[Vm-dev] Re: vm problem on cog an stack v3

David T. Lewis lewis at mail.msen.com
Wed Mar 30 12:11:34 UTC 2016


On Tue, Mar 29, 2016 at 11:09:49PM -0700, Eliot Miranda wrote:
>  
> Hi Nicolas,
> 
> On Tue, Mar 29, 2016 at 2:36 PM, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
> 
> >
> >
> >
> > 2016-03-27 15:39 GMT+02:00 Nicolas Cellier <
> > nicolas.cellier.aka.nice at gmail.com>:
> >
> >> To be more specific,
> >>
> >> The first thing to do would be to confirm that the KernelNumber tests
> >> fail with a squeak.stack.v3 VM compiled with head revision of COG VM
> >> branch, whatever OS.
> >>
> >> Then knowing from which SVN version of COG VM branch the KernelNumber the
> >> tests start failing would be nice.
> >>
> >> The bisect job is to:
> >> - iterate on version number (whatever strategy, bsearch or something)
> >> - checkout VM sources
> >> - compile the build.favouriteOS/squeak.stack.v3
> >> - run a v3 image with the generated VM and launch KernelNumber tests
> >>
> >> Really a job for a jenkins bot, travis bot or whatever...
> >> The next good thing would be to give a little love to build.squeak.org
> >> or anyother similar solution.
> >> I only see red disks on this site...
> >>
> >>
> > Follow up: I did bissect manually:
> > 3648 (VMMaker.oscog-eem.nice.1729) OK
> > 3649 (VMMaker.oscog-eem.1740) Not OK
> >
> > So something went wrong for V3 between these two versions.
> > At the same time, it works for spur.
> >
> > Spur objects are 8 bytes aligned, v3 objects are 4 bytes aligned...
> > So fetching 64 bits from V3 MUST be decomposed into 2 32 bits fetch to
> > avoid a segfault related to misalign fetch.
> >
> > OK, let's look how it is performed:
> >
> > fetchLong64: longIndex ofObject: oop
> >     <returnTypeC: #sqLong>
> >     ^self cppIf: BytesPerWord = 8
> >         ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex <<
> > 3)]
> >         ifFalse:
> >             [self cppIf: VMBIGENDIAN
> >                 ifTrue: [((self long32At: oop + self baseHeaderSize +
> > (longIndex << 3)) asUnsignedLongLong << 32)
> >                     + (self long32At: oop + self baseHeaderSize +
> > (longIndex << 3 + 4))]
> >                 ifFalse: [(self long32At: oop + self baseHeaderSize +
> > (longIndex << 3))
> >                     + ((self long32At: oop + self baseHeaderSize +
> > (longIndex << 3 + 4)) asUnsignedLongLong << 32)]]
> >
> > AH AH! Operation is:
> >     low + (((unsigned) high) << 32)
> >
> > With low declared as SIGNED (long32At is signed GRRR).
> > What if bit 32 of low is 1? then the compiler will perform:
> >
> > (unsigned long) low + ...
> >
> > AND THIS WILL DO A SIGN EXTENSION...
> >
> > It's not that my code was false...
> > It's just that it uncovered a bug in fetchLong64:ofObject:
> >
> > That cost me many hours, but that enforce my conviction about signedness...
> > I would much much much prefer to call unsignedLong32At because it's what I
> > mean 9 times out of 10: get the magnitude...
> >
> 
> Let's add them.  It would be great to have them all consistent too.  But at
> least let's add unsignedLong32At:[put:].  Wait.  In the simulator
> longAt:[put:] et al are unsigned.  So we have three choices:
> 
> a) live with it
> b) make longAt:[put:] long32At:put: et al unsigned in C, and add
> signedLongAt:[put:] et al.
> c) make the simulator's longAt:[put:] long32At:put: signed and then add
> unsignedLongAt:[put:] et al
> 
> Any others?
> 
> I think a) is unwise.  The simulator and C should agree.  Nicolas, I can
> join your experience in finding that debugging these issues is extremely
> time consuming.
> 
> My preference is for b); some uses where the type should be signed will by
> located by C compiler warnings; we hope that VM tests will catch the others)
> 

Can we think about this for a couple of days? I'd like to experiment with
it a bit. My first inclination was to choose b), but in re-reading the
method comments that I put into MemoryAccess methods, I am thinking that
it may be wise to declare things explicitly as in c). I don't know the
right answer, but I think it would be good to play with the options a
bit before proceeding.

For example, to show existing practice (not necessarily good):

MemoryAccess>>shortAt: oop
	"Answer the signed short integer value at an object memory location. The
	result is a signed sqInt value. Negative values will be sign extended, such
	that if the short integer value is binary 16rFFFF, the result will be 16rFFFFFFFF
	for a 32-bit object memory, or 16rFFFFFFFFFFFFFFFF for a 64-bit object memory."

	"sqInt shortAt(sqInt oop) { return shortAtPointer(pointerForOop(oop)); }"

	<inline: true>
	^ self shortAtPointer: (self pointerForOop: oop)


Dave


> 
> > 2016-03-27 0:40 GMT+01:00 Nicolas Cellier <
> >> nicolas.cellier.aka.nice at gmail.com>:
> >>
> >>> Hi,
> >>> before I continue, i've noticed that the large integer multiply seems
> >>> broken on v3 object memory (cog & stack)
> >>> Note that this does not happen on Spur.
> >>>
> >>> This is independent of my recent changes of LargeIntegers plugin as it
> >>> happens BEFORE these changes and is related to primitive 29 rather than to
> >>> the plugin...
> >>> Here are the symptoms:
> >>>
> >>> halfPower := 10000.
> >>> s := 111111111111111.
> >>> head := s quo: halfPower.
> >>> tail := s - (head * halfPower).
> >>> {
> >>>    head as: ByteArray.
> >>>    (1 to: halfPower digitLength) collect: [:i | halfPower digitAt: i]
> >>> as: ByteArray.
> >>>    (head*halfPower) as: ByteArray.
> >>> }.
> >>>
> >>> the correct result is:
> >>>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 14 101])
> >>>
> >>> the wrong result I obtained with SVN revision 3651 compiled by myself is:
> >>>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 254 61])
> >>>
> >>> The most significant bits (above 32) are wrong...
> >>> The pattern I obtain is (with most significant bit put back left)
> >>>
> >>> 2r00111101 << 8 +  2r11111110   "wrong result"
> >>> 2r01100101 << 8 + 2r00001110  "Correct result"
> >>>
> >>> I completely fail to infer what's going on from this pattern...
> >>>
> >>> This is on MacOSX clang --version
> >>> Apple LLVM version 7.3.0 (clang-703.0.29)
> >>> Target: x86_64-apple-darwin15.4.0
> >>>
> >>> This goes thru primitiveMultiplyLargeIntegers (29)
> >>> oopResult := self magnitude64BitIntegerFor: result neg: aIsNegative ~=
> >>> bIsNegative.
> >>> -> sz > 4
> >>>                 ifTrue: [objectMemory storeLong64: 0 ofObject:
> >>> newLargeInteger withValue: magnitude]
> >>> (which I changed recently)
> >>>
> >>> then:
> >>> storeLong64: longIndex ofObject: oop withValue: value
> >>>     <var: #value type: #sqLong>
> >>>     self flag: #endianness.
> >>>     self long32At: oop + self baseHeaderSize + (longIndex << 3) put:
> >>> (self cCode: [value] inSmalltalk: [value bitAnd: 16rFFFFFFFF]);
> >>>         long32At: oop + self baseHeaderSize + (longIndex << 3) + 4 put:
> >>> (value >> 32).
> >>>     ^value
> >>>
> >>> I don't see anything wrong with this code...
> >>> Well, using a shift on signed value is not that good, but it works for
> >>> at least 3 reasons:
> >>> - we throw the signBit extension away
> >>> - slang inlining misses the signedness difference, and the generated C
> >>> code is correct.
> >>> - Anyway, in our case, the sign bit was 0...
> >>>
> >>> Previous implementation in magnitude64BitIntegerFor:neg: was:
> >>> sz > 4 ifTrue:
> >>>                 [objectMemory storeLong32: 1 ofObject: newLargeInteger
> >>> withValue: magnitude >> 32].
> >>>             objectMemory
> >>>                 storeLong32: 0
> >>>                 ofObject: newLargeInteger
> >>>                 withValue: (self cCode: [magnitude] inSmalltalk:
> >>> [magnitude bitAnd: 16rFFFFFFFF])
> >>>
> >>> Not much different, except that the high 32 bits and low 32 bits are
> >>> written in a different order...
> >>>
> >>> If I had a server I'd like to bisect
> >>> - from which version does this happens
> >>> - for which OS
> >>> - for which compiler
> >>>
> >>> Without such information, I think I'll have to debug it either thru
> >>> simulator or directly in gdb, but I feel like I'm really losing my time :(
> >>>
> >>> And I've got a 2nd problem like this one...
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> 
> 
> -- 
> _,,,^..^,,,_
> best, Eliot



More information about the Vm-dev mailing list