[Vm-dev] Re: vm problem on cog an stack v3

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Mar 29 21:36:57 UTC 2016


2016-03-27 15:39 GMT+02:00 Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com>:

> To be more specific,
>
> The first thing to do would be to confirm that the KernelNumber tests fail
> with a squeak.stack.v3 VM compiled with head revision of COG VM branch,
> whatever OS.
>
> Then knowing from which SVN version of COG VM branch the KernelNumber the
> tests start failing would be nice.
>
> The bisect job is to:
> - iterate on version number (whatever strategy, bsearch or something)
> - checkout VM sources
> - compile the build.favouriteOS/squeak.stack.v3
> - run a v3 image with the generated VM and launch KernelNumber tests
>
> Really a job for a jenkins bot, travis bot or whatever...
> The next good thing would be to give a little love to build.squeak.org or
> anyother similar solution.
> I only see red disks on this site...
>
>
Follow up: I did bissect manually:
3648 (VMMaker.oscog-eem.nice.1729) OK
3649 (VMMaker.oscog-eem.1740) Not OK

So something went wrong for V3 between these two versions.
At the same time, it works for spur.

Spur objects are 8 bytes aligned, v3 objects are 4 bytes aligned...
So fetching 64 bits from V3 MUST be decomposed into 2 32 bits fetch to
avoid a segfault related to misalign fetch.

OK, let's look how it is performed:

fetchLong64: longIndex ofObject: oop
    <returnTypeC: #sqLong>
    ^self cppIf: BytesPerWord = 8
        ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex <<
3)]
        ifFalse:
            [self cppIf: VMBIGENDIAN
                ifTrue: [((self long32At: oop + self baseHeaderSize +
(longIndex << 3)) asUnsignedLongLong << 32)
                    + (self long32At: oop + self baseHeaderSize +
(longIndex << 3 + 4))]
                ifFalse: [(self long32At: oop + self baseHeaderSize +
(longIndex << 3))
                    + ((self long32At: oop + self baseHeaderSize +
(longIndex << 3 + 4)) asUnsignedLongLong << 32)]]

AH AH! Operation is:
    low + (((unsigned) high) << 32)

With low declared as SIGNED (long32At is signed GRRR).
What if bit 32 of low is 1? then the compiler will perform:

(unsigned long) low + ...

AND THIS WILL DO A SIGN EXTENSION...

It's not that my code was false...
It's just that it uncovered a bug in fetchLong64:ofObject:

That cost me many hours, but that enforce my conviction about signedness...
I would much much much prefer to call unsignedLong32At because it's what I
mean 9 times out of 10: get the magnitude...

2016-03-27 0:40 GMT+01:00 Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
>
>> Hi,
>> before I continue, i've noticed that the large integer multiply seems
>> broken on v3 object memory (cog & stack)
>> Note that this does not happen on Spur.
>>
>> This is independent of my recent changes of LargeIntegers plugin as it
>> happens BEFORE these changes and is related to primitive 29 rather than to
>> the plugin...
>> Here are the symptoms:
>>
>> halfPower := 10000.
>> s := 111111111111111.
>> head := s quo: halfPower.
>> tail := s - (head * halfPower).
>> {
>>    head as: ByteArray.
>>    (1 to: halfPower digitLength) collect: [:i | halfPower digitAt: i] as:
>> ByteArray.
>>    (head*halfPower) as: ByteArray.
>> }.
>>
>> the correct result is:
>>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 14 101])
>>
>> the wrong result I obtained with SVN revision 3651 compiled by myself is:
>>  #(#[199 25 70 150 2] #[16 39] #[112 237 78 18 254 61])
>>
>> The most significant bits (above 32) are wrong...
>> The pattern I obtain is (with most significant bit put back left)
>>
>> 2r00111101 << 8 +  2r11111110   "wrong result"
>> 2r01100101 << 8 + 2r00001110  "Correct result"
>>
>> I completely fail to infer what's going on from this pattern...
>>
>> This is on MacOSX clang --version
>> Apple LLVM version 7.3.0 (clang-703.0.29)
>> Target: x86_64-apple-darwin15.4.0
>>
>> This goes thru primitiveMultiplyLargeIntegers (29)
>> oopResult := self magnitude64BitIntegerFor: result neg: aIsNegative ~=
>> bIsNegative.
>> -> sz > 4
>>                 ifTrue: [objectMemory storeLong64: 0 ofObject:
>> newLargeInteger withValue: magnitude]
>> (which I changed recently)
>>
>> then:
>> storeLong64: longIndex ofObject: oop withValue: value
>>     <var: #value type: #sqLong>
>>     self flag: #endianness.
>>     self long32At: oop + self baseHeaderSize + (longIndex << 3) put:
>> (self cCode: [value] inSmalltalk: [value bitAnd: 16rFFFFFFFF]);
>>         long32At: oop + self baseHeaderSize + (longIndex << 3) + 4 put:
>> (value >> 32).
>>     ^value
>>
>> I don't see anything wrong with this code...
>> Well, using a shift on signed value is not that good, but it works for at
>> least 3 reasons:
>> - we throw the signBit extension away
>> - slang inlining misses the signedness difference, and the generated C
>> code is correct.
>> - Anyway, in our case, the sign bit was 0...
>>
>> Previous implementation in magnitude64BitIntegerFor:neg: was:
>> sz > 4 ifTrue:
>>                 [objectMemory storeLong32: 1 ofObject: newLargeInteger
>> withValue: magnitude >> 32].
>>             objectMemory
>>                 storeLong32: 0
>>                 ofObject: newLargeInteger
>>                 withValue: (self cCode: [magnitude] inSmalltalk:
>> [magnitude bitAnd: 16rFFFFFFFF])
>>
>> Not much different, except that the high 32 bits and low 32 bits are
>> written in a different order...
>>
>> If I had a server I'd like to bisect
>> - from which version does this happens
>> - for which OS
>> - for which compiler
>>
>> Without such information, I think I'll have to debug it either thru
>> simulator or directly in gdb, but I feel like I'm really losing my time :(
>>
>> And I've got a 2nd problem like this one...
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160329/fa42bec1/attachment-0001.htm


More information about the Vm-dev mailing list