[Vm-dev] Debugging Spur32 genPrimitiveHighBit

Eliot Miranda eliot.miranda at gmail.com
Wed Feb 19 18:37:01 UTC 2020


Hi Nicolas,

On Wed, Feb 19, 2020 at 12:10 AM Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> I think that my confusion came from the order of specification, like this
>
> mod operand1:reg operand2:r/m
> https://www.felixcloutier.com/x86/lzcnt (same as Intel manuals)
> Capture d’écran 2020-02-19 à 02.17.06.png
> (14 Ko)
>
> <https://mail.google.com/mail/u/0?ui=2&ik=94f5e792ab&attid=0.2&permmsgid=msg-a:r6786606863442654297&view=att&disp=safe&realattid=f_k6smnaem1>
>
>
> versus mod: modReg RM: r/m RO: reg
> Intel uses such mixture of middle endian in the documentation too...
> Capture d’écran 2020-02-19 à 02.13.57.png
> (23 Ko)
>
> <https://mail.google.com/mail/u/0?ui=2&ik=94f5e792ab&attid=0.1&permmsgid=msg-a:r6786606863442654297&view=att&disp=safe&realattid=f_k6smjvcd0>
>
>
> I have also further fixed the rexw:r:x:b: for X64 encoding of R8-R15 in
> LZCNT and BSR (but did not regenerate, that can wait)
> Capture d’écran 2020-02-19 à 02.22.21.png
> (31 Ko)
>
> <https://mail.google.com/mail/u/0?ui=2&ik=94f5e792ab&attid=0.3&permmsgid=msg-a:r6786606863442654297&view=att&disp=safe&realattid=f_k6smux4k2>
>
>
Don't worry abut it.  It's a weird;y complex ISA :-)  You;'ve fixed it and
yesterday I gave it a fairly thorough test and it seems to be working fine.
We could write some proper tests in terms of right shift until zero,
comparing with highBit I suppose.


> Le mer. 19 févr. 2020 à 00:51, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
>
>> ModR/M is a byte which encodes instruction operands
>> https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
>> mod is the mod field (2 bits) RM is the r/m field (3 bits), RO is the reg
>> (or opcode) field (3 bits).
>> For LZCNT dest is in RO, and mask (source) is in r/m.
>> Same for BSR.
>> So I had it wrong... will fix ASAP
>>
>> Le mar. 18 févr. 2020 à 23:37, Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>
>>> It seems that I miss-interpreted the order of mod:RM:RO:, it seems like
>>> O means output, not operand...
>>>
>>> Le mar. 18 févr. 2020 à 22:53, Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com> a écrit :
>>>
>>>> Hi all,
>>>> I confirm that generated code for CLZ (LZCNT) is incorrect on IA32 arch.
>>>> The registers are swapped!
>>>>
>>>> Here is an extract:
>>>>
>>>>     0x5c26a9c: 83 e0 01        andl   $0x1, %eax
>>>>     0x5c26a9f: eb 11           jmp    0x5c26ab2
>>>>     0x5c26aa1: 90              nop
>>>>     0x5c26aa2: 90              nop
>>>>     0x5c26aa3: 90              nop
>>>>     0x5c26aa4: 89 d0           movl   %edx, %eax
>>>>     0x5c26aa6: 83 e0 03        andl   $0x3, %eax
>>>>     0x5c26aa9: 75 f1           jne    0x5c26a9c
>>>>     0x5c26aab: 8b 02           movl   (%edx), %eax
>>>>     0x5c26aad: 25 ff ff 3f 00  andl   $0x3fffff, %eax           ; imm =
>>>> 0x3FFFFF
>>>>     0x5c26ab2: 39 c8           cmpl   %ecx, %eax
>>>>     0x5c26ab4: 75 e0           jne    0x5c26a96
>>>>     0x5c26ab6: f3 0f bd d0     lzcntl %eax, %edx
>>>>     0x5c26aba: 74 0d           je     0x5c26ac9
>>>>     0x5c26abc: 35 1f 00 00 00  xorl   $0x1f, %eax
>>>>     0x5c26ac1: 89 c2           movl   %eax, %edx
>>>>     0x5c26ac3: d1 e2           shll   %edx
>>>>     0x5c26ac5: 83 c2 01        addl   $0x1, %edx
>>>>     0x5c26ac8: c3              retl
>>>>
>>>> What happens is that we count the leading zeros in $eax (TempReg) and
>>>> store the result in $edx (ReceiverResultReg) ...
>>>>
>>>>     0x5c26ab6: f3 0f bd d0     lzcntl %eax, %edx
>>>>
>>>> We want the contrary!
>>>>
>>>> $eax contains 1, presumably because we used it to check for
>>>> SmallInteger tag bit:
>>>>
>>>>     0x5c26a9c: 83 e0 01        andl   $0x1, %eax
>>>>
>>>> So we invariably get 31 leading zeroes in $edx (but we will later
>>>> overwrite the contents of $edx).
>>>>
>>>> Then we interpret $eax as the result (thus 1 leading zero), bitInvert,
>>>> and get 30 as the result for highBit, store that in $edx (shifted and
>>>> tagged), and we're done... Err!
>>>>
>>>> Obviously the code generation is wrong!
>>>> It did work when I first wrote it, and still work on x64 because we use
>>>> $eax (TempReg) as both source and dest reg...
>>>>
>>>> Though, I do not see what we are doing wrong:
>>>>
>>>> concretizeClzRR
>>>> <inline: true>
>>>> | maskReg dest  |
>>>> maskReg := operands at: 0.
>>>> dest := operands at: 1.
>>>> machineCode
>>>> at: 0 put: 16rF3;
>>>> at: 1 put: 16r0F;
>>>> at: 2 put: 16rBD;
>>>> at: 3 put: (self mod: ModReg RM: dest RO: maskReg).
>>>> ^4
>>>>
>>>> and we invoke it like that:
>>>> cogit ClzR: srcReg R: destReg.
>>>>
>>>> ClzR: reg1 R: reg2
>>>> "reg2 := reg1 countLeadingZeros"
>>>> <inline: true>
>>>> <returnTypeC: #'AbstractInstruction *'>
>>>> ^self gen: ClzRR operand: reg1 operand: reg2
>>>>
>>>> So it seems to me that all is in the correct order...
>>>> cogitIA32 likewise seems perfrectly correct:
>>>>
>>>> static sqInt
>>>> genPrimitiveHighBit(void)
>>>> {
>>>>     AbstractInstruction *anInstruction11;
>>>>     AbstractInstruction *anInstruction2;
>>>>     AbstractInstruction *anInstruction4;
>>>>     AbstractInstruction *jumpNegativeReceiver;
>>>>     AbstractInstruction *jumpNegativeReceiver11;
>>>>     AbstractInstruction *jumpNegativeReceiver3;
>>>>     sqInt literal1;
>>>>
>>>>
>>>>         /* remove excess tag bits from the receiver oop */
>>>>
>>>>         /* and use the abstract cogit facility for case of single
>>>> tag-bit */
>>>>         /* begin genHighBitIn:ofSmallIntegerOopWithSingleTagBit: */
>>>>         if (((ceCheckLZCNT()) & (1U << 5)) != 0) {
>>>>                 /* begin
>>>> genHighBitClzIn:ofSmallIntegerOopWithSingleTagBit: */
>>>>                 genoperandoperand(ClzRR, ReceiverResultReg, TempReg);
>>>>                 if (!(setsConditionCodesFor(lastOpcode(), JumpZero))) {
>>>>                         /* begin checkQuickConstant:forInstruction: */
>>>>                         anInstruction2 = genoperandoperand(CmpCqR, 0,
>>>> TempReg);
>>>>                 }
>>>>
>>>>                 /* Note the nice bit trick below:
>>>>                    highBit_1based_of_small_int_value = (BytesPerWord *
>>>> 8) - leadingZeroCout_of_oop - 1 toAccountForTagBit.
>>>>                    This is like 2 complements (- reg - 1) on
>>>> (BytesPerWord * 8) log2 bits, or exactly a bit invert operation... */
>>>>                 jumpNegativeReceiver3 =
>>>> genConditionalBranchoperand(JumpZero, ((sqInt)0));
>>>>                 /* begin checkLiteral:forInstruction: */
>>>>                 literal1 = (BytesPerWord * 8) - 1;
>>>>                 anInstruction11 = genoperandoperand(XorCwR,
>>>> (BytesPerWord * 8) - 1, TempReg);
>>>>                 jumpNegativeReceiver = jumpNegativeReceiver3;
>>>>                 goto l10;
>>>>         }
>>>>
>>>> which concretize in:
>>>>
>>>>         case ClzRR:
>>>>                 /* begin concretizeClzRR */
>>>>                 maskReg = ((self_in_dispatchConcretize->operands))[0];
>>>>                 dest = ((self_in_dispatchConcretize->operands))[1];
>>>>                 ((self_in_dispatchConcretize->machineCode))[0] = 243;
>>>>                 ((self_in_dispatchConcretize->machineCode))[1] = 15;
>>>>                 ((self_in_dispatchConcretize->machineCode))[2] = 189;
>>>>                 ((self_in_dispatchConcretize->machineCode))[3] =
>>>> (modRMRO(self_in_dispatchConcretize, ModReg, dest, maskReg));
>>>>                 return 4;
>>>>
>>>> The order seems correct all the way down...
>>>> As a workaround, I could revert Eliot's optimization and force a
>>>>     cogit MoveR: ReceiverResultReg R: TempReg.
>>>> But I'd rather want to understand where's the problem...
>>>> Another pair of eyes may help!
>>>>
>>>>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200219/1b500090/attachment.html>


More information about the Vm-dev mailing list