[Vm-dev] type inferrence for #<< is IMO all wrong...

gettimothy gettimothy at zoho.com
Mon Feb 24 21:06:01 UTC 2020


"Microshaft"



Slow clap.



Sorry for the noise, but there is an important lesson in this thread. For the logically/technical/honest minded, running into "Microshafts" when not aware of them, is a huge time sink.



Hours and hours of "what am I missing?" are avoidable when aware of the concept of the "Microshaft"



Thank you from a newbs.



cheers.


---- On Mon, 24 Feb 2020 09:39:14 -0500 Eliot Miranda <eliot.miranda at gmail.com> wrote ----


Hi Ben,



On Feb 24, 2020, at 3:54 AM, Ben Coman <mailto:btc at openinworld.com> wrote:






On Mon, 24 Feb 2020 at 17:52, Nicolas Cellier <mailto:nicolas.cellier.aka.nice at gmail.com> wrote:

 Hi Pierre,

this complexity is incurred by C standard.

Standard committee decided that left shifting a signed integer was Undefined Behavior (UB).

Our program shall legitimately not depend on undefined behavior.

So the compiler has a license to presume that we didn't engage any operation resulting in UB.

This way, it can perform aggressive optimizations.



For example, if we write

    sqInt foo (sqInt x) {

        sqInt y = x << 15;

        if( y < 0 ) {

            return ~y;

        }

    }

Then the compiler can and sometimes will completely eliminate the if, because either y is a positive integer bit that overflows the sign bit, or y was negative, which both are UB.



Though, at machine level, with 2-complement, the left shift is perfectly well defined whatever signed-ness.

Yes, we might overflow the sign bit with a 0 in case of negative y.

But that's exactly the same condition as for positive y, we can overflow the sign bit with a 1...


Most of the time, what we want is signed left shift.

So we have first to convert to unsigned, shift, then convert back to signed integer.





Naive question... would be using __asm__ to do a signed left shift with embedded assembler be a valid alternative or a poor solution?







It would be a poor solution.  We want to use C as a portable assembler; a low-level completely portable language.  We have to drop down to the non-portable level below C for very few things currently:- to access the stack and frame pointers

       - for switching to the C stack in runtime calls from JITed machine code

       - for assert checking

       - for correcting bugs in alloca (used in FFI callout marshalling)

- to read the machine registers in error reporting; see e.g. ucontext.h and code in platforms/Unix/vm/sqUnixMain.c

- to access memory barrier instructions for multiprocessor coordination in the threaded FFI



Shifts are just ordinary arithmetic operations and the route by which C99 made something as innocuous as a signed left shift undefined behaviour has, IMO, to do with the fuck up that happened in the C world when moving to 64-bits.  Instead of deciding that char=8 bits, short=16 bits, int=32 bits, long=64 bits *and implicitly typed results are longs*, somehow we ended up with two models, ILP, with int=long=32 and long long=__int64=64, which is used by Microshaft Windows (to lock people in? to allow compiling 32-bit code more or less unchanged in 63-bits??), and LLP with int=32 bits, long=long long=64 bits, and the default type in 64-bits remaining as int (!!), and incompatible printf specifier for printing 64-bit values between Windows & everyone else.  So C went from being a reasonably simple helpful language that was easy to generate code to, to a minefield of portability pitfalls that we’ve been dealing with for several years.



Nicolas has done most of the work here, for example, providing macros to hide incompatible printf specifiers, and types for accessing pointer-width integers. I grew up with the transition from K&R to ANSI C, and used to pull very low-level tricks before C was well optimized, for example to write JIT BitBLT in C, and I have no sympathy and not a little irritation for the mess C has got itself into. There is a pretension that C is a great language for writing applications in that falls flat when one uses a truly high level language.  Somewhere along the line C got confused and decided to stop being a language great for writing operating system kernels, simple efficient utilities and virtual machines and device drivers in, and tried to be something one could write efficient code in with lots of int variables, compiling the same code on 32-bits and 64-bits; whereas the right way to do this is to require people to use the long type, enforce LLP, and have the implicit type as long, and then none of this crap would have occurred.



But C is a Balkans, used as a political football in the struggle between Microshaft and other vendors, used as a benchmark vehicle.  And so not everyone followed DEC’s lead when they provided an LLP compiler for 63-bit DEC Alpha AXP in ‘94, and so now we have to write (sqInt)((usqInt)x << s).



Here’s my choice for the most egregiously stupid decision made in the move to 64-bits:

If signed left shift is undefined behaviour then OK, let’s just cast to unsigned and write this:



     (signed)((unsigned)x << s).



And then we can write a macro to hide it:



#define leftshift(v,s) ((signed)((unsigned)(v)<< (s)))



But unsigned is *NOT* a type modifier, but instead short hand for unsigned int, so in 64-bits it always truncates to 32-bits.



What was a beautifully simple language, enormously useful for low-level systems builders, has become a sorry mess, full of gotchers that is no longer great for anything (it’s still the only low-level portable game in town, but it’s no longer great).



Hmph.


cheers -ben





 



So thanks to the standard, we now have to write completely stupid and illegible expressions like you can contemplate in OSVM...

Fortunately they are auto-generated, but unfortunately, it's a nightmare to decipher when reviewing or debugging generated code!

Maybe we could introduce some macro for legibility...



Nowadays, no one should program in C (and C++) without basic knowledge of UB, because it can strikes anytime, anywhere.



Le lun. 24 févr. 2020 à 09:53, pierre misse <mailto:pierre_misse25 at msn.com> a écrit :

  Hi all, 
 
 I was thinking at first that this was already too complex for my neophyte mind, but I'd like to learn :)
 I'm not sure of why we use (sqInt)(((usqInt) x instead of (sqInt) x.
 Is it a way to do an absolute function to match Squeak semantics?
 I've also seen 3 different cast which looks a bit weird to me ˆˆ
 Thanks in advance !
 
 Pierre

On 23/02/2020 20:40, Eliot Miranda wrote:

 





On Sat, Feb 22, 2020 at 1:03 AM Nicolas Cellier <mailto:nicolas.cellier.aka.nice at gmail.com> wrote:

  Hi Eliot, What we must do is correctly infer the type that we generate.


Computing the type of (1 << anything) as int is correct if that is what we generate.



If it indeed overflows, then we must cast to VM word sqInt at code generation and indeed infer

(((sqInt) 1) << anything) as sqInt.

But this is invoking UB and potentially many -O compiler problems if left operand is negative.

So I think that we do generate this foolish (but recommended) expression:

((sqInt)(((usqInt) x) << anything))

and must we infer that type to sqInt. Maybe we have another path for constants...





It appears we do all this.  I'm reading (*my own?!) generateShiftLeft:on:indent: and it casts to 64-bits if appropriate and casts to unsigned and back if signed.  It is the second most complex generation method, after generateInlineCppIfElse:asArgument:on:indent:.





Le sam. 22 févr. 2020 à 05:34, Eliot Miranda <mailto:eliot.miranda at gmail.com> a écrit :

  Hi Nicolas, Hi Clément, Hi Pierre, Hi All, 

    I'm working again on ARMv8 having got confused and with help clear headed again.  So I'm getting the real system to run (it displays the full desktop before crashing).  One issue is the IMO mis-typing of #<<.



The expression in question is

1 << (cogit coInterpreter highBit: cogit methodZone zoneEnd)

which is used to determine a type for mask:

mask := 1 << (cogit coInterpreter highBit: cogit methodZone zoneEnd) - dataCacheMinLineLength.






In Smalltalk this evaluates to something large, and in the real VM it should evaluate to 1 << 39.  However, because in CCodeGenerator>>returnTypeForSend:in:ifNil: we explicitly assume C semantics here:



^kernelReturnTypes

at: sel

ifAbsent:

[sel

caseOf: {

...

"C99 Sec Bitwise shift operators ... 3 Semantics ...

The integer promotions are performed on each of the operands.

 The type of the result is that of the promoted left operand..."

[#>>] -> [sendNode receiver typeFrom: self in: aTMethod].

[#<<] -> [sendNode receiver typeFrom: self in: aTMethod].



we compute the type of 1 << anything to be #int, and hence the type of mask to be int, and hence mask
 is both truncated to 32-bits and later extended to 64-bits by virtue of being passed as an argument to a #sqInt parameter. So instead of generating the mask 16r7FFFFFFFC0 we generate the mask 16rFFFFFFFFFFFFFFC0.  Clearly nonsense.



It seems to me that we have the wrong philosophy.  In CCodeGenerator>>returnTypeForSend:in:ifNil: we should be computing types that cause the generated C to mimic as closely as possible what happens in Smalltalk, *not* typing according to the C99 standard,
 which creates unintended incompatibilities between the simulated Slang and the generated Slang.



Surely what we should be doing for << is seeing if the right operand is a constant and if so typing according to that, but in general typing << as #sqInt or #usqInt, depending on the type of the left operand.  This is what Smalltalk does; left-shifting
 a signed value preserves the sign; left shifting a non-negative value always yields a non-negative value.  Yes, eventually we will truncate to the word size, but the word size is sqInt, not int, and we are familiar with the truncation issue.



The mistyping of << as int is unexpected and extremely inconvenient.  We force the Slang programmer to type all variables receiving the result of a << explicitly.



Do you agree or does my way lead to chaos?



_,,,^..^,,,_

best, Eliot

























--
 _,,,^..^,,,_

best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20200224/683699b2/attachment-0001.html>


More information about the Vm-dev mailing list