[Vm-dev] type inferrence for #<< is IMO all wrong...

Ben Coman btc at openinworld.com
Mon Feb 24 11:54:30 UTC 2020


On Mon, 24 Feb 2020 at 17:52, Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> Hi Pierre,
> this complexity is incurred by C standard.
> Standard committee decided that left shifting a signed integer was
> Undefined Behavior (UB).
> Our program shall legitimately not depend on undefined behavior.
> So the compiler has a license to presume that we didn't engage any
> operation resulting in UB.
> This way, it can perform aggressive optimizations.
>
> For example, if we write
>     sqInt foo (sqInt x) {
>         sqInt y = x << 15;
>         if( y < 0 ) {
>             return ~y;
>         }
>     }
> Then the compiler can and sometimes will completely eliminate the if,
> because either y is a positive integer bit that overflows the sign bit, or
> y was negative, which both are UB.
>
> Though, at machine level, with 2-complement, the left shift is perfectly
> well defined whatever signed-ness.
> Yes, we might overflow the sign bit with a 0 in case of negative y.
> But that's exactly the same condition as for positive y, we can overflow
> the sign bit with a 1...
>
> Most of the time, what we want is signed left shift.
> So we have first to convert to unsigned, shift, then convert back to
> signed integer.
>

Naive question... would be using __asm__ to do a signed left shift with
embedded assembler be a valid alternative or a poor solution?

cheers -ben




> So thanks to the standard, we now have to write completely stupid and
> illegible expressions like you can contemplate in OSVM...
> Fortunately they are auto-generated, but unfortunately, it's a nightmare
> to decipher when reviewing or debugging generated code!
> Maybe we could introduce some macro for legibility...
>
> Nowadays, no one should program in C (and C++) without basic knowledge of
> UB, because it can strikes anytime, anywhere.
>
> Le lun. 24 févr. 2020 à 09:53, pierre misse <pierre_misse25 at msn.com> a
> écrit :
>
>>
>>
>> Hi all,
>>
>> I was thinking at first that this was already too complex for my neophyte
>> mind, but I'd like to learn :)
>> I'm not sure of why we use (sqInt)(((usqInt) x instead of (sqInt) x.
>> Is it a way to do an absolute function to match Squeak semantics?
>> I've also seen 3 different cast which looks a bit weird to me ˆˆ
>> Thanks in advance !
>>
>> Pierre
>> On 23/02/2020 20:40, Eliot Miranda wrote:
>>
>>
>>
>>
>> On Sat, Feb 22, 2020 at 1:03 AM Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> wrote:
>>
>>>
>>> Hi Eliot,
>>> What we must do is correctly infer the type that we generate.
>>> Computing the type of (1 << anything) as int is correct if that is what
>>> we generate.
>>>
>>> If it indeed overflows, then we must cast to VM word sqInt at code
>>> generation and indeed infer
>>> (((sqInt) 1) << anything) as sqInt.
>>> But this is invoking UB and potentially many -O compiler problems if
>>> left operand is negative.
>>> So I think that we do generate this foolish (but recommended) expression:
>>> ((sqInt)(((usqInt) x) << anything))
>>> and must we infer that type to sqInt. Maybe we have another path for
>>> constants...
>>>
>>
>> It appears we do all this.  I'm reading (*my
>> own?!) generateShiftLeft:on:indent: and it casts to 64-bits if appropriate
>> and casts to unsigned and back if signed.  It is the second most complex
>> generation method, after generateInlineCppIfElse:asArgument:on:indent:.
>>
>>>
>>>
>>> Le sam. 22 févr. 2020 à 05:34, Eliot Miranda <eliot.miranda at gmail.com>
>>> a écrit :
>>>
>>>>
>>>> Hi Nicolas, Hi Clément, Hi Pierre, Hi All,
>>>>
>>>>     I'm working again on ARMv8 having got confused and with help clear
>>>> headed again.  So I'm getting the real system to run (it displays the full
>>>> desktop before crashing).  One issue is the IMO mis-typing of #<<.
>>>>
>>>> The expression in question is
>>>> 1 << (cogit coInterpreter highBit: cogit methodZone zoneEnd)
>>>> which is used to determine a type for mask:
>>>> mask := 1 << (cogit coInterpreter highBit: cogit methodZone zoneEnd) -
>>>> dataCacheMinLineLength.
>>>>
>>>>
>>>> In Smalltalk this evaluates to something large, and in the real VM it
>>>> should evaluate to 1 << 39.  However, because in
>>>> CCodeGenerator>>returnTypeForSend:in:ifNil: we explicitly assume C
>>>> semantics here:
>>>>
>>>> ^kernelReturnTypes
>>>> at: sel
>>>> ifAbsent:
>>>> [sel
>>>> caseOf: {
>>>> ...
>>>> "C99 Sec Bitwise shift operators ... 3 Semantics ...
>>>> The integer promotions are performed on each of the operands.
>>>>  The type of the result is that of the promoted left operand..."
>>>> [#>>] -> [sendNode receiver typeFrom: self in: aTMethod].
>>>> [#<<] -> [sendNode receiver typeFrom: self in: aTMethod].
>>>>
>>>> we compute the type of 1 << anything to be #int, and hence the type of
>>>> mask to be int, and hence mask is both truncated to 32-bits and later
>>>> extended to 64-bits by virtue of being passed as an argument to a #sqInt
>>>> parameter. So instead of generating the mask 16r7FFFFFFFC0 we generate
>>>> the mask 16rFFFFFFFFFFFFFFC0.  Clearly nonsense.
>>>>
>>>> It seems to me that we have the wrong philosophy.
>>>> In CCodeGenerator>>returnTypeForSend:in:ifNil: we should be computing types
>>>> that cause the generated C to mimic as closely as possible what happens in
>>>> Smalltalk, *not* typing according to the C99 standard, which creates
>>>> unintended incompatibilities between the simulated Slang and the generated
>>>> Slang.
>>>>
>>>> Surely what we should be doing for << is seeing if the right operand is
>>>> a constant and if so typing according to that, but in general typing << as
>>>> #sqInt or #usqInt, depending on the type of the left operand.  This is what
>>>> Smalltalk does; left-shifting a signed value preserves the sign; left
>>>> shifting a non-negative value always yields a non-negative value.  Yes,
>>>> eventually we will truncate to the word size, but the word size is sqInt,
>>>> not int, and we are familiar with the truncation issue.
>>>>
>>>> The mistyping of << as int is unexpected and extremely inconvenient.
>>>> We force the Slang programmer to type all variables receiving the result of
>>>> a << explicitly.
>>>>
>>>> Do you agree or does my way lead to chaos?
>>>>
>>>> _,,,^..^,,,_
>>>> best, Eliot
>>>>
>>>
>>
>> --
>> _,,,^..^,,,_
>> best, Eliot
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200224/58c1f623/attachment.html>


More information about the Vm-dev mailing list