[Vm-dev] Debugging Spur32 genPrimitiveHighBit
Nicolas Cellier
nicolas.cellier.aka.nice at gmail.com
Tue Feb 18 22:37:06 UTC 2020
It seems that I miss-interpreted the order of mod:RM:RO:, it seems like O
means output, not operand...
Le mar. 18 févr. 2020 à 22:53, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :
> Hi all,
> I confirm that generated code for CLZ (LZCNT) is incorrect on IA32 arch.
> The registers are swapped!
>
> Here is an extract:
>
> 0x5c26a9c: 83 e0 01 andl $0x1, %eax
> 0x5c26a9f: eb 11 jmp 0x5c26ab2
> 0x5c26aa1: 90 nop
> 0x5c26aa2: 90 nop
> 0x5c26aa3: 90 nop
> 0x5c26aa4: 89 d0 movl %edx, %eax
> 0x5c26aa6: 83 e0 03 andl $0x3, %eax
> 0x5c26aa9: 75 f1 jne 0x5c26a9c
> 0x5c26aab: 8b 02 movl (%edx), %eax
> 0x5c26aad: 25 ff ff 3f 00 andl $0x3fffff, %eax ; imm =
> 0x3FFFFF
> 0x5c26ab2: 39 c8 cmpl %ecx, %eax
> 0x5c26ab4: 75 e0 jne 0x5c26a96
> 0x5c26ab6: f3 0f bd d0 lzcntl %eax, %edx
> 0x5c26aba: 74 0d je 0x5c26ac9
> 0x5c26abc: 35 1f 00 00 00 xorl $0x1f, %eax
> 0x5c26ac1: 89 c2 movl %eax, %edx
> 0x5c26ac3: d1 e2 shll %edx
> 0x5c26ac5: 83 c2 01 addl $0x1, %edx
> 0x5c26ac8: c3 retl
>
> What happens is that we count the leading zeros in $eax (TempReg) and
> store the result in $edx (ReceiverResultReg) ...
>
> 0x5c26ab6: f3 0f bd d0 lzcntl %eax, %edx
>
> We want the contrary!
>
> $eax contains 1, presumably because we used it to check for SmallInteger
> tag bit:
>
> 0x5c26a9c: 83 e0 01 andl $0x1, %eax
>
> So we invariably get 31 leading zeroes in $edx (but we will later
> overwrite the contents of $edx).
>
> Then we interpret $eax as the result (thus 1 leading zero), bitInvert, and
> get 30 as the result for highBit, store that in $edx (shifted and tagged),
> and we're done... Err!
>
> Obviously the code generation is wrong!
> It did work when I first wrote it, and still work on x64 because we use
> $eax (TempReg) as both source and dest reg...
>
> Though, I do not see what we are doing wrong:
>
> concretizeClzRR
> <inline: true>
> | maskReg dest |
> maskReg := operands at: 0.
> dest := operands at: 1.
> machineCode
> at: 0 put: 16rF3;
> at: 1 put: 16r0F;
> at: 2 put: 16rBD;
> at: 3 put: (self mod: ModReg RM: dest RO: maskReg).
> ^4
>
> and we invoke it like that:
> cogit ClzR: srcReg R: destReg.
>
> ClzR: reg1 R: reg2
> "reg2 := reg1 countLeadingZeros"
> <inline: true>
> <returnTypeC: #'AbstractInstruction *'>
> ^self gen: ClzRR operand: reg1 operand: reg2
>
> So it seems to me that all is in the correct order...
> cogitIA32 likewise seems perfrectly correct:
>
> static sqInt
> genPrimitiveHighBit(void)
> {
> AbstractInstruction *anInstruction11;
> AbstractInstruction *anInstruction2;
> AbstractInstruction *anInstruction4;
> AbstractInstruction *jumpNegativeReceiver;
> AbstractInstruction *jumpNegativeReceiver11;
> AbstractInstruction *jumpNegativeReceiver3;
> sqInt literal1;
>
>
> /* remove excess tag bits from the receiver oop */
>
> /* and use the abstract cogit facility for case of single tag-bit
> */
> /* begin genHighBitIn:ofSmallIntegerOopWithSingleTagBit: */
> if (((ceCheckLZCNT()) & (1U << 5)) != 0) {
> /* begin
> genHighBitClzIn:ofSmallIntegerOopWithSingleTagBit: */
> genoperandoperand(ClzRR, ReceiverResultReg, TempReg);
> if (!(setsConditionCodesFor(lastOpcode(), JumpZero))) {
> /* begin checkQuickConstant:forInstruction: */
> anInstruction2 = genoperandoperand(CmpCqR, 0,
> TempReg);
> }
>
> /* Note the nice bit trick below:
> highBit_1based_of_small_int_value = (BytesPerWord * 8)
> - leadingZeroCout_of_oop - 1 toAccountForTagBit.
> This is like 2 complements (- reg - 1) on (BytesPerWord
> * 8) log2 bits, or exactly a bit invert operation... */
> jumpNegativeReceiver3 =
> genConditionalBranchoperand(JumpZero, ((sqInt)0));
> /* begin checkLiteral:forInstruction: */
> literal1 = (BytesPerWord * 8) - 1;
> anInstruction11 = genoperandoperand(XorCwR, (BytesPerWord
> * 8) - 1, TempReg);
> jumpNegativeReceiver = jumpNegativeReceiver3;
> goto l10;
> }
>
> which concretize in:
>
> case ClzRR:
> /* begin concretizeClzRR */
> maskReg = ((self_in_dispatchConcretize->operands))[0];
> dest = ((self_in_dispatchConcretize->operands))[1];
> ((self_in_dispatchConcretize->machineCode))[0] = 243;
> ((self_in_dispatchConcretize->machineCode))[1] = 15;
> ((self_in_dispatchConcretize->machineCode))[2] = 189;
> ((self_in_dispatchConcretize->machineCode))[3] =
> (modRMRO(self_in_dispatchConcretize, ModReg, dest, maskReg));
> return 4;
>
> The order seems correct all the way down...
> As a workaround, I could revert Eliot's optimization and force a
> cogit MoveR: ReceiverResultReg R: TempReg.
> But I'd rather want to understand where's the problem...
> Another pair of eyes may help!
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200218/c489564b/attachment-0001.html>
More information about the Vm-dev
mailing list