[Vm-dev] Re: vm problem on cog an stack v3

Eliot Miranda eliot.miranda at gmail.com
Wed Mar 30 06:09:49 UTC 2016


Hi Nicolas,

On Tue, Mar 29, 2016 at 2:36 PM, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
>
>
> 2016-03-27 15:39 GMT+02:00 Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
>
>> To be more specific,
>>
>> The first thing to do would be to confirm that the KernelNumber tests
>> fail with a squeak.stack.v3 VM compiled with head revision of COG VM
>> branch, whatever OS.
>>
>> Then knowing from which SVN version of COG VM branch the KernelNumber the
>> tests start failing would be nice.
>>
>> The bisect job is to:
>> - iterate on version number (whatever strategy, bsearch or something)
>> - checkout VM sources
>> - compile the build.favouriteOS/squeak.stack.v3
>> - run a v3 image with the generated VM and launch KernelNumber tests
>>
>> Really a job for a jenkins bot, travis bot or whatever...
>> The next good thing would be to give a little love to build.squeak.org
>> or anyother similar solution.
>> I only see red disks on this site...
>>
>>
> Follow up: I did bissect manually:
> 3648 (VMMaker.oscog-eem.nice.1729) OK
> 3649 (VMMaker.oscog-eem.1740) Not OK
>
> So something went wrong for V3 between these two versions.
> At the same time, it works for spur.
>
> Spur objects are 8 bytes aligned, v3 objects are 4 bytes aligned...
> So fetching 64 bits from V3 MUST be decomposed into 2 32 bits fetch to
> avoid a segfault related to misalign fetch.
>
> OK, let's look how it is performed:
>
> fetchLong64: longIndex ofObject: oop
>     <returnTypeC: #sqLong>
>     ^self cppIf: BytesPerWord = 8
>         ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex <<
> 3)]
>         ifFalse:
>             [self cppIf: VMBIGENDIAN
>                 ifTrue: [((self long32At: oop + self baseHeaderSize +
> (longIndex << 3)) asUnsignedLongLong << 32)
>                     + (self long32At: oop + self baseHeaderSize +
> (longIndex << 3 + 4))]
>                 ifFalse: [(self long32At: oop + self baseHeaderSize +
> (longIndex << 3))
>                     + ((self long32At: oop + self baseHeaderSize +
> (longIndex << 3 + 4)) asUnsignedLongLong << 32)]]
>
> AH AH! Operation is:
>     low + (((unsigned) high) << 32)
>
> With low declared as SIGNED (long32At is signed GRRR).
> What if bit 32 of low is 1? then the compiler will perform:
>
> (unsigned long) low + ...
>
> AND THIS WILL DO A SIGN EXTENSION...
>
> It's not that my code was false...
> It's just that it uncovered a bug in fetchLong64:ofObject:
>
> That cost me many hours, but that enforce my conviction about signedness...
> I would much much much prefer to call unsignedLong32At because it's what I
> mean 9 times out of 10: get the magnitude...
>

Let's add them.  It would be great to have them all consistent too.  But at
least let's add unsignedLong32At:[put:].  Wait.  In the simulator
longAt:[put:] et al are unsigned.  So we have three choices:

a) live with it
b) make longAt:[put:] long32At:put: et al unsigned in C, and add
signedLongAt:[put:] et al.
c) make the simulator's longAt:[put:] long32At:put: signed and then add
unsignedLongAt:[put:] et al

Any others?

I think a) is unwise.  The simulator and C should agree.  Nicolas, I can
join your experience in finding that debugging these issues is extremely
time consuming.

My preference is for b); some uses where the type should be signed will by
located by C compiler warnings; we hope that VM tests will catch the others)


> 2016-03-27 0:40 GMT+01:00 Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com>:
>>
>>> Hi,
>>> before I continue, i've noticed that the large integer multiply seems
>>> broken on v3 object memory (cog & stack)
>>> Note that this does not happen on Spur.
>>>
>>> This is independent of my recent changes of LargeIntegers plugin as it
>>> happens BEFORE these changes and is related to primitive 29 rather than to
>>> the plugin...
>>> Here are the symptoms:
>>>
>>> halfPower := 10000.
>>> s := 111111111111111.
>>> head := s quo: halfPower.
>>> tail := s - (head * halfPower).
>>> {
>>>    head as: ByteArray.
>>>    (1 to: halfPower digitLength) collect: [:i | halfPower digitAt: i]
>>> as: ByteArray.
>>>    (head*halfPower) as: ByteArray.
>>> }.
>>>
>>> the correct result is:
>>>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 14 101])
>>>
>>> the wrong result I obtained with SVN revision 3651 compiled by myself is:
>>>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 254 61])
>>>
>>> The most significant bits (above 32) are wrong...
>>> The pattern I obtain is (with most significant bit put back left)
>>>
>>> 2r00111101 << 8 +  2r11111110   "wrong result"
>>> 2r01100101 << 8 + 2r00001110  "Correct result"
>>>
>>> I completely fail to infer what's going on from this pattern...
>>>
>>> This is on MacOSX clang --version
>>> Apple LLVM version 7.3.0 (clang-703.0.29)
>>> Target: x86_64-apple-darwin15.4.0
>>>
>>> This goes thru primitiveMultiplyLargeIntegers (29)
>>> oopResult := self magnitude64BitIntegerFor: result neg: aIsNegative ~=
>>> bIsNegative.
>>> -> sz > 4
>>>                 ifTrue: [objectMemory storeLong64: 0 ofObject:
>>> newLargeInteger withValue: magnitude]
>>> (which I changed recently)
>>>
>>> then:
>>> storeLong64: longIndex ofObject: oop withValue: value
>>>     <var: #value type: #sqLong>
>>>     self flag: #endianness.
>>>     self long32At: oop + self baseHeaderSize + (longIndex << 3) put:
>>> (self cCode: [value] inSmalltalk: [value bitAnd: 16rFFFFFFFF]);
>>>         long32At: oop + self baseHeaderSize + (longIndex << 3) + 4 put:
>>> (value >> 32).
>>>     ^value
>>>
>>> I don't see anything wrong with this code...
>>> Well, using a shift on signed value is not that good, but it works for
>>> at least 3 reasons:
>>> - we throw the signBit extension away
>>> - slang inlining misses the signedness difference, and the generated C
>>> code is correct.
>>> - Anyway, in our case, the sign bit was 0...
>>>
>>> Previous implementation in magnitude64BitIntegerFor:neg: was:
>>> sz > 4 ifTrue:
>>>                 [objectMemory storeLong32: 1 ofObject: newLargeInteger
>>> withValue: magnitude >> 32].
>>>             objectMemory
>>>                 storeLong32: 0
>>>                 ofObject: newLargeInteger
>>>                 withValue: (self cCode: [magnitude] inSmalltalk:
>>> [magnitude bitAnd: 16rFFFFFFFF])
>>>
>>> Not much different, except that the high 32 bits and low 32 bits are
>>> written in a different order...
>>>
>>> If I had a server I'd like to bisect
>>> - from which version does this happens
>>> - for which OS
>>> - for which compiler
>>>
>>> Without such information, I think I'll have to debug it either thru
>>> simulator or directly in gdb, but I feel like I'm really losing my time :(
>>>
>>> And I've got a 2nd problem like this one...
>>>
>>>
>>>
>>>
>>>
>>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160329/e807a561/attachment-0001.htm


More information about the Vm-dev mailing list