[Vm-dev] Aarch64 number bug

Fri Jul 2 14:07:34 UTC 2021

Hi Tobias,

On Mon, Jun 21, 2021 at 4:20 AM Tobias Pape <Das.Linux at gmx.de> wrote:

>
> Hi
>
>
> > On 21. Jun 2021, at 12:15, Bruce O'Neel <bruce.oneel at pckswarms.ch>
> wrote:
> >
> > Hi,
> >
> > I have now gone back and checked this carefully.  The git repository at
> the commit before this one produces the right answer for 2 raisedTo: 64 on
> Aarch64.  It produces the wrong answer, 0, when this commit is included.
> >
> > So it's this commit.  I'll look at the code to see if I can see why the
> code is bad.
> >
> > BTW, how would one find the two referenced packages?
>
> https://source.squeak.org/VMMaker/ has all the packages. (or
> http://source.squeak.org/VMMaker.html)
>
>         The diff for the first one (
> https://source.squeak.org/VMMaker/VMMaker.oscog-eem.2799.diff) ist
> truncated,
>         while the diff-changeset is ok:
> http://source.squeak.org/VMMaker/changes/VMMaker.oscog-eem.2799(2798).cs
>         (you see all changed methods, but not the actual diff)
>
> But this only adds the JumpMulOverflow / JumpNoMulOverflow
> names/class-vars.
>
> The implementation diff is found at
>
> http://www.squeaksource.com/ClosedVMMaker/ClosedVMMaker-eem.98.diff
>
> I would suggest that our bug does not concern Jumps, but rather only Mul,
> so we have to start looking
> at the differences for concretizeMulOverflowRRR ...
>
> From what I see, tho, is that the MUL and the SMULH instruction order has
> been reversed.
>

Indeed.  I created a regression.  Apologies for all this effort over my
mistake.

> (Also, the arm blog uses an different interesting method of detecting
> overflow,
> by missusing the argument shitfter of CMP:
> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/detecting-overflow-from-mul
> Although this is written for older arm, it seems to be applicable to the
> 64-variant too)
>

So here's how I decided to do it on AArch64.  Maybe you can find something
better.

AArch64 implement 64 x 64 => 128 bit multiply by providing a 64x64 => low
64 bits and a 64x64 => high 64 bits multiply instruction.  So if a
64x64=>128 multiply has not overflowed the high 64 bits will be an
extension of the sign bit of the 64x64=>low 64 instruction.  i.e. high 64
will either be all 0s or all 1s (all 1s = -1). So if one subtracts the sign
bit from the high 64 result one gets 0 if there is no overflow.  Here's the
instruction sequence:

smulh x9/RISCTemp, x4/Class, x8/Arg1
mul x4/Class, x4/Class, x8/Arg1
lsr x1, x4/Class, #63
adds x9/RISCTemp, x9/RISCTemp, x1
b.ne 0x44b8  // b.any = *@88 (to overflow)

In the JIT's assembler back end for ARMv8 this is done by two methods:

concretizeMulOverflowRRR
"ARMv8 has no multiply overflow detection.  Instead it is synthesized from
the two halves of
a 64x64=>128 bit multiply. The upper 64-bits are tested.  The sequence is
high64 := SMULH a,b
a/low64 := MUL a,b
signBit := low64/a >> 63
high64 := high64 + signBit
If high64 is zero after this sequence then the multiply has not overflowed,
since high64 is an extension of signBit if no overflow (either 0 or -1) and
both -1 + 1 = 0 and 0 + 0 = 0. Note that because we restrict ourselves to
three concrete ARMv8 instructions per abstract instruction the last
operation
of the sequence is generated in concretizeMulOverflowJump.

C6.2.196 MUL C6-1111
C6.2.242 SMULH C6-1184
C6.2.180 LSR (immediate) C6-1081 110100110 (1)"

<inline: true>
| reg1 reg2 reg3 |
reg1 := operands at: 0.
reg2 := operands at: 1.
reg3 := operands at: 2.
"RISCTempReg := high(reg1 * reg2); must precede destructive MUL"
machineCode
at: 0
put: 2r1001101101 << 22
+ (reg1 << 16)
+ (XZR << 10)
+ (reg2 << 5)
+ RISCTempReg.
"reg3 := reg1 * reg2"
machineCode
at: 1
put: 2r10011011 << 24
+ (reg1 << 16)
+ (XZR << 10)
+ (reg2 << 5)
+ reg3.
"CArg1Reg := sign(reg3)"
machineCode
at: 2
put: 2r1101001101 << 22
+ (63 << 16) "constant to shift by"
+ (63 << 10)
+ (reg3 << 5)
+ CArg1Reg. "cuz CArg0Reg == TempReg"
"RISCTempReg := RISCTempReg + CArg1Reg/sign
is in concretizeMulOverflowJump"
"cogit processor disassembleInstructionAt: 0 In: machineCode object"
"cogit processor disassembleInstructionAt: 4 In: machineCode object"
"cogit processor disassembleInstructionAt: 8 In: machineCode object"
^12

concretizeMulOverflowJump
"Sizing/generating jumps.
Jump targets can be to absolute addresses or other abstract instructions.
Generating initial trampolines instructions may have no maxSize and be to
absolute addresses.
Otherwise instructions must have a machineCodeSize which must be kept to."
<inline: false>
| offset |
offset := self computeJumpTargetOffset - 4. "-4 because the jump is from
the second word..."
  self assert: (offset ~= 0 and: [self isInImmediateBranchRange: offset]).
"See concretizeMulOverflowRRR
RISCTempReg := RISCTempReg + CArg1Reg/sign.
JumpZero/NonZero"
machineCode
at: 0
put: 2r10001011 << 24
+ (ArithmeticAddS << 29)
+ (CArg1Reg << 16)
+ (RISCTempReg << 5)
+ RISCTempReg;
at: 1
put: (self
cond: (opcode = JumpMulOverflow ifTrue: [NE] ifFalse: [EQ])
offset: offset). "B offset"
^8

Best regards
>         -Tobias
>
>
>
> > cheers
> >
> > bruce
> >
> > On 2021-06-20T22:44:36.000+02:00, Bruce O'Neel <bruce.oneel at pckswarms.ch>
> wrote:
> > Playing with Git bisect gets us to the commit below as the first bad one.
> >
> > I started the git bisect with the two commits below.  The first one is
> bad, the second is the good one.
> >
> > a783502b249c4a4fedc88b6e07837d405feab144 - zero9 - builds, zero error.
> >
> > 9f73148b8da4bc00278b83faa8da6b1c418fa54f - zero10 - builds works
> >
> > Now this is not 100% guaranteed because one of the commits that git
> bisect chose had a build error so I had to mark that one as bad.  But the
> commit found does sound possible.
> >
> > f632ee2888014ee88330ee994e13c9c609b57b5f is the first bad commit
> > commit f632ee2888014ee88330ee994e13c9c609b57b5f
> > Author: Eliot Miranda <eliot.miranda at gmail.com>
> > Date:   Wed Sep 2 10:45:27 2020 -0700
> >
> >     CogVM source as per VMMaker.oscog-eem.2799/ClosedVMMaker-eem.98
> >
> >     Ha!  I am *STUPID*.
> >     Integer overflow is not only determined by the upper 64-bits of a
> 64x64=>128 bit
> >     multiply being either all zero or all ones (0 or -1), but by the
> upper 64-bits
> >     being an extension of the most significant bit of the lower 64 bits!!
> >
> > :040000 040000 ffb8fbcd8ab5e2ca1936429b2daaade62909c178
> a5ac89d8b1d4980c5329835cdd3d4a2387b1fac5 M      nsspur64src
> > :040000 040000 e606dac14ee1db4ac59b0949bb03cd8e657d7aa7
> be666051cd1d52d0055e319432b66a0fba61063e M      spur64src
> > :040000 040000 1dea4faf99821c60e2c2461076bfe8b99d4dea9b
> afbef904e8cbc7e4b80e1606420dd6293e12f3e9 M      spurlowcode64src
> > :040000 040000 f66c681004806d5880af7c0a3115038f4f2f3361
> 0e8d76bedcf22b6ad7497ea2e3dc44f3c36c2f3a M      spursista64src
> >
> > On 2021-06-20T21:48:49.000+02:00, Craig Latta <
> craig at blackpagedigital.com> wrote:
> > Hi Ken--
> >
> > I brought up a development environment on a Raspberry Pi 4.
> >
> > > Unfortunately, my attempt to build a VM simulator on aarch64 fails due
> > > to a divide by zero bug.
> > >
> > > Perhaps because
> > > 16r8000000000000000 printStringHex. ==> '0'
> >
> > update-eem-463.mcm has a postload action which turns on Sista,
> > which recompiles the system, which tries to recompile
> > FloatArrayTest>>testFloatArrayPluginPrimitiveAtPut, which exercises this
> > very bug during scanning, when trying to create the float 1e-127.
> >
> > For now I've rewritten testFloatArrayPluginPrimitiveAtPut and
> > testFloatArrayPluginPrimitiveAt on my system to do nothing. When
> > building GdbARMv8Plugin, there's some other kind of
> > header-include-ordering mayhem, similar to the problem with Linux
> > features.h when building the VM. (At least two levels of it, between
> > bfd.h and the aarch64 simulator's config.h, and config.h and other
> > system headers.) I expect no one has successfully built GdbARMv8Plugin
> > on anything but a M1 macOS machine so far.
> >
> >
> > -C
>
>
>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20210702/ec3ff22a/attachment-0001.html>