[Vm-dev] Debugging Spur32 genPrimitiveHighBit

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Tue Feb 18 23:51:11 UTC 2020


ModR/M is a byte which encodes instruction operands
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
mod is the mod field (2 bits) RM is the r/m field (3 bits), RO is the reg
(or opcode) field (3 bits).
For LZCNT dest is in RO, and mask (source) is in r/m.
Same for BSR.
So I had it wrong... will fix ASAP

Le mar. 18 févr. 2020 à 23:37, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> a écrit :

> It seems that I miss-interpreted the order of mod:RM:RO:, it seems like O
> means output, not operand...
>
> Le mar. 18 févr. 2020 à 22:53, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
>
>> Hi all,
>> I confirm that generated code for CLZ (LZCNT) is incorrect on IA32 arch.
>> The registers are swapped!
>>
>> Here is an extract:
>>
>>     0x5c26a9c: 83 e0 01        andl   $0x1, %eax
>>     0x5c26a9f: eb 11           jmp    0x5c26ab2
>>     0x5c26aa1: 90              nop
>>     0x5c26aa2: 90              nop
>>     0x5c26aa3: 90              nop
>>     0x5c26aa4: 89 d0           movl   %edx, %eax
>>     0x5c26aa6: 83 e0 03        andl   $0x3, %eax
>>     0x5c26aa9: 75 f1           jne    0x5c26a9c
>>     0x5c26aab: 8b 02           movl   (%edx), %eax
>>     0x5c26aad: 25 ff ff 3f 00  andl   $0x3fffff, %eax           ; imm =
>> 0x3FFFFF
>>     0x5c26ab2: 39 c8           cmpl   %ecx, %eax
>>     0x5c26ab4: 75 e0           jne    0x5c26a96
>>     0x5c26ab6: f3 0f bd d0     lzcntl %eax, %edx
>>     0x5c26aba: 74 0d           je     0x5c26ac9
>>     0x5c26abc: 35 1f 00 00 00  xorl   $0x1f, %eax
>>     0x5c26ac1: 89 c2           movl   %eax, %edx
>>     0x5c26ac3: d1 e2           shll   %edx
>>     0x5c26ac5: 83 c2 01        addl   $0x1, %edx
>>     0x5c26ac8: c3              retl
>>
>> What happens is that we count the leading zeros in $eax (TempReg) and
>> store the result in $edx (ReceiverResultReg) ...
>>
>>     0x5c26ab6: f3 0f bd d0     lzcntl %eax, %edx
>>
>> We want the contrary!
>>
>> $eax contains 1, presumably because we used it to check for SmallInteger
>> tag bit:
>>
>>     0x5c26a9c: 83 e0 01        andl   $0x1, %eax
>>
>> So we invariably get 31 leading zeroes in $edx (but we will later
>> overwrite the contents of $edx).
>>
>> Then we interpret $eax as the result (thus 1 leading zero), bitInvert,
>> and get 30 as the result for highBit, store that in $edx (shifted and
>> tagged), and we're done... Err!
>>
>> Obviously the code generation is wrong!
>> It did work when I first wrote it, and still work on x64 because we use
>> $eax (TempReg) as both source and dest reg...
>>
>> Though, I do not see what we are doing wrong:
>>
>> concretizeClzRR
>> <inline: true>
>> | maskReg dest  |
>> maskReg := operands at: 0.
>> dest := operands at: 1.
>> machineCode
>> at: 0 put: 16rF3;
>> at: 1 put: 16r0F;
>> at: 2 put: 16rBD;
>> at: 3 put: (self mod: ModReg RM: dest RO: maskReg).
>> ^4
>>
>> and we invoke it like that:
>> cogit ClzR: srcReg R: destReg.
>>
>> ClzR: reg1 R: reg2
>> "reg2 := reg1 countLeadingZeros"
>> <inline: true>
>> <returnTypeC: #'AbstractInstruction *'>
>> ^self gen: ClzRR operand: reg1 operand: reg2
>>
>> So it seems to me that all is in the correct order...
>> cogitIA32 likewise seems perfrectly correct:
>>
>> static sqInt
>> genPrimitiveHighBit(void)
>> {
>>     AbstractInstruction *anInstruction11;
>>     AbstractInstruction *anInstruction2;
>>     AbstractInstruction *anInstruction4;
>>     AbstractInstruction *jumpNegativeReceiver;
>>     AbstractInstruction *jumpNegativeReceiver11;
>>     AbstractInstruction *jumpNegativeReceiver3;
>>     sqInt literal1;
>>
>>
>>         /* remove excess tag bits from the receiver oop */
>>
>>         /* and use the abstract cogit facility for case of single tag-bit
>> */
>>         /* begin genHighBitIn:ofSmallIntegerOopWithSingleTagBit: */
>>         if (((ceCheckLZCNT()) & (1U << 5)) != 0) {
>>                 /* begin
>> genHighBitClzIn:ofSmallIntegerOopWithSingleTagBit: */
>>                 genoperandoperand(ClzRR, ReceiverResultReg, TempReg);
>>                 if (!(setsConditionCodesFor(lastOpcode(), JumpZero))) {
>>                         /* begin checkQuickConstant:forInstruction: */
>>                         anInstruction2 = genoperandoperand(CmpCqR, 0,
>> TempReg);
>>                 }
>>
>>                 /* Note the nice bit trick below:
>>                    highBit_1based_of_small_int_value = (BytesPerWord * 8)
>> - leadingZeroCout_of_oop - 1 toAccountForTagBit.
>>                    This is like 2 complements (- reg - 1) on
>> (BytesPerWord * 8) log2 bits, or exactly a bit invert operation... */
>>                 jumpNegativeReceiver3 =
>> genConditionalBranchoperand(JumpZero, ((sqInt)0));
>>                 /* begin checkLiteral:forInstruction: */
>>                 literal1 = (BytesPerWord * 8) - 1;
>>                 anInstruction11 = genoperandoperand(XorCwR, (BytesPerWord
>> * 8) - 1, TempReg);
>>                 jumpNegativeReceiver = jumpNegativeReceiver3;
>>                 goto l10;
>>         }
>>
>> which concretize in:
>>
>>         case ClzRR:
>>                 /* begin concretizeClzRR */
>>                 maskReg = ((self_in_dispatchConcretize->operands))[0];
>>                 dest = ((self_in_dispatchConcretize->operands))[1];
>>                 ((self_in_dispatchConcretize->machineCode))[0] = 243;
>>                 ((self_in_dispatchConcretize->machineCode))[1] = 15;
>>                 ((self_in_dispatchConcretize->machineCode))[2] = 189;
>>                 ((self_in_dispatchConcretize->machineCode))[3] =
>> (modRMRO(self_in_dispatchConcretize, ModReg, dest, maskReg));
>>                 return 4;
>>
>> The order seems correct all the way down...
>> As a workaround, I could revert Eliot's optimization and force a
>>     cogit MoveR: ReceiverResultReg R: TempReg.
>> But I'd rather want to understand where's the problem...
>> Another pair of eyes may help!
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200219/979590a6/attachment-0001.html>


More information about the Vm-dev mailing list