[Vm-dev] type inferrence for #<< is IMO all wrong...

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Mon Feb 24 09:52:07 UTC 2020


Hi Pierre,
this complexity is incurred by C standard.
Standard committee decided that left shifting a signed integer was
Undefined Behavior (UB).
Our program shall legitimately not depend on undefined behavior.
So the compiler has a license to presume that we didn't engage any
operation resulting in UB.
This way, it can perform aggressive optimizations.

For example, if we write
    sqInt foo (sqInt x) {
        sqInt y = x << 15;
        if( y < 0 ) {
            return ~y;
        }
    }
Then the compiler can and sometimes will completely eliminate the if,
because either y is a positive integer bit that overflows the sign bit, or
y was negative, which both are UB.

Though, at machine level, with 2-complement, the left shift is perfectly
well defined whatever signed-ness.
Yes, we might overflow the sign bit with a 0 in case of negative y.
But that's exactly the same condition as for positive y, we can overflow
the sign bit with a 1...

Most of the time, what we want is signed left shift.
So we have first to convert to unsigned, shift, then convert back to signed
integer.
So thanks to the standard, we now have to write completely stupid and
illegible expressions like you can contemplate in OSVM...
Fortunately they are auto-generated, but unfortunately, it's a nightmare to
decipher when reviewing or debugging generated code!
Maybe we could introduce some macro for legibility...

Nowadays, no one should program in C (and C++) without basic knowledge of
UB, because it can strikes anytime, anywhere.

Le lun. 24 févr. 2020 à 09:53, pierre misse <pierre_misse25 at msn.com> a
écrit :

>
>
> Hi all,
>
> I was thinking at first that this was already too complex for my neophyte
> mind, but I'd like to learn :)
> I'm not sure of why we use (sqInt)(((usqInt) x instead of (sqInt) x.
> Is it a way to do an absolute function to match Squeak semantics?
> I've also seen 3 different cast which looks a bit weird to me ˆˆ
> Thanks in advance !
>
> Pierre
> On 23/02/2020 20:40, Eliot Miranda wrote:
>
>
>
>
> On Sat, Feb 22, 2020 at 1:03 AM Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> wrote:
>
>>
>> Hi Eliot,
>> What we must do is correctly infer the type that we generate.
>> Computing the type of (1 << anything) as int is correct if that is what
>> we generate.
>>
>> If it indeed overflows, then we must cast to VM word sqInt at code
>> generation and indeed infer
>> (((sqInt) 1) << anything) as sqInt.
>> But this is invoking UB and potentially many -O compiler problems if left
>> operand is negative.
>> So I think that we do generate this foolish (but recommended) expression:
>> ((sqInt)(((usqInt) x) << anything))
>> and must we infer that type to sqInt. Maybe we have another path for
>> constants...
>>
>
> It appears we do all this.  I'm reading (*my
> own?!) generateShiftLeft:on:indent: and it casts to 64-bits if appropriate
> and casts to unsigned and back if signed.  It is the second most complex
> generation method, after generateInlineCppIfElse:asArgument:on:indent:.
>
>>
>>
>> Le sam. 22 févr. 2020 à 05:34, Eliot Miranda <eliot.miranda at gmail.com> a
>> écrit :
>>
>>>
>>> Hi Nicolas, Hi Clément, Hi Pierre, Hi All,
>>>
>>>     I'm working again on ARMv8 having got confused and with help clear
>>> headed again.  So I'm getting the real system to run (it displays the full
>>> desktop before crashing).  One issue is the IMO mis-typing of #<<.
>>>
>>> The expression in question is
>>> 1 << (cogit coInterpreter highBit: cogit methodZone zoneEnd)
>>> which is used to determine a type for mask:
>>> mask := 1 << (cogit coInterpreter highBit: cogit methodZone zoneEnd) -
>>> dataCacheMinLineLength.
>>>
>>>
>>> In Smalltalk this evaluates to something large, and in the real VM it
>>> should evaluate to 1 << 39.  However, because in
>>> CCodeGenerator>>returnTypeForSend:in:ifNil: we explicitly assume C
>>> semantics here:
>>>
>>> ^kernelReturnTypes
>>> at: sel
>>> ifAbsent:
>>> [sel
>>> caseOf: {
>>> ...
>>> "C99 Sec Bitwise shift operators ... 3 Semantics ...
>>> The integer promotions are performed on each of the operands.
>>>  The type of the result is that of the promoted left operand..."
>>> [#>>] -> [sendNode receiver typeFrom: self in: aTMethod].
>>> [#<<] -> [sendNode receiver typeFrom: self in: aTMethod].
>>>
>>> we compute the type of 1 << anything to be #int, and hence the type of
>>> mask to be int, and hence mask is both truncated to 32-bits and later
>>> extended to 64-bits by virtue of being passed as an argument to a #sqInt
>>> parameter. So instead of generating the mask 16r7FFFFFFFC0 we generate
>>> the mask 16rFFFFFFFFFFFFFFC0.  Clearly nonsense.
>>>
>>> It seems to me that we have the wrong philosophy.
>>> In CCodeGenerator>>returnTypeForSend:in:ifNil: we should be computing types
>>> that cause the generated C to mimic as closely as possible what happens in
>>> Smalltalk, *not* typing according to the C99 standard, which creates
>>> unintended incompatibilities between the simulated Slang and the generated
>>> Slang.
>>>
>>> Surely what we should be doing for << is seeing if the right operand is
>>> a constant and if so typing according to that, but in general typing << as
>>> #sqInt or #usqInt, depending on the type of the left operand.  This is what
>>> Smalltalk does; left-shifting a signed value preserves the sign; left
>>> shifting a non-negative value always yields a non-negative value.  Yes,
>>> eventually we will truncate to the word size, but the word size is sqInt,
>>> not int, and we are familiar with the truncation issue.
>>>
>>> The mistyping of << as int is unexpected and extremely inconvenient.  We
>>> force the Slang programmer to type all variables receiving the result of a
>>> << explicitly.
>>>
>>> Do you agree or does my way lead to chaos?
>>>
>>> _,,,^..^,,,_
>>> best, Eliot
>>>
>>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200224/9c7e9a72/attachment.html>


More information about the Vm-dev mailing list