[Vm-dev] Unsigned/signed type declarations for #long32At: (was: Re: vm problem on cog an stack v3)

Thu Mar 31 00:41:26 UTC 2016

Changing the subject line to reflect current discussion topic.

On Wed, Mar 30, 2016 at 10:46:21PM +0200, Nicolas Cellier wrote:
>  
> 2016-03-30 13:46 GMT+02:00 Ben Coman <btc at openinworld.com>:
>
> > On Wed, Mar 30, 2016 at 2:09 PM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> >
> > > Hi Nicolas,
> > >
> > > On Tue, Mar 29, 2016 at 2:36 PM, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
> > >
> > >> 2016-03-27 15:39 GMT+02:00 Nicolas Cellier <
> > nicolas.cellier.aka.nice at gmail.com>:
> > >>>
> > >>> To be more specific,
> > >>>
> > >>> The first thing to do would be to confirm that the KernelNumber tests
> > fail with a squeak.stack.v3 VM compiled with head revision of COG VM
> > branch, whatever OS.
> > >>>
> > >>> Then knowing from which SVN version of COG VM branch the KernelNumber
> > the tests start failing would be nice.
> > >>>
> > >>> The bisect job is to:
> > >>> - iterate on version number (whatever strategy, bsearch or something)
> > >>> - checkout VM sources
> > >>> - compile the build.favouriteOS/squeak.stack.v3
> > >>> - run a v3 image with the generated VM and launch KernelNumber tests
> > >>>
> > >>> Really a job for a jenkins bot, travis bot or whatever...
> > >>> The next good thing would be to give a little love to build.squeak.org
> > or anyother similar solution.
> > >>> I only see red disks on this site...
> > >>>
> > >>
> > >> Follow up: I did bissect manually:
> > >> 3648 (VMMaker.oscog-eem.nice.1729) OK
> > >> 3649 (VMMaker.oscog-eem.1740) Not OK
> > >>
> > >> So something went wrong for V3 between these two versions.
> > >> At the same time, it works for spur.
> > >>
> > >> Spur objects are 8 bytes aligned, v3 objects are 4 bytes aligned...
> > >> So fetching 64 bits from V3 MUST be decomposed into 2 32 bits fetch to
> > avoid a segfault related to misalign fetch.
> > >>
> > >> OK, let's look how it is performed:
> > >>
> > >> fetchLong64: longIndex ofObject: oop
> > >>     <returnTypeC: #sqLong>
> > >>     ^self cppIf: BytesPerWord = 8
> > >>         ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex
> > << 3)]
> > >>         ifFalse:
> > >>             [self cppIf: VMBIGENDIAN
> > >>                 ifTrue: [((self long32At: oop + self baseHeaderSize +
> > (longIndex << 3)) asUnsignedLongLong << 32)
> > >>                     + (self long32At: oop + self baseHeaderSize +
> > (longIndex << 3 + 4))]
> > >>                 ifFalse: [(self long32At: oop + self baseHeaderSize +
> > (longIndex << 3))
> > >>                     + ((self long32At: oop + self baseHeaderSize +
> > (longIndex << 3 + 4)) asUnsignedLongLong << 32)]]
> > >>
> > >> AH AH! Operation is:
> > >>     low + (((unsigned) high) << 32)
> > >>
> > >> With low declared as SIGNED (long32At is signed GRRR).
> > >> What if bit 32 of low is 1? then the compiler will perform:
> > >>
> > >> (unsigned long) low + ...
> > >>
> > >> AND THIS WILL DO A SIGN EXTENSION...
> > >>
> > >> It's not that my code was false...
> > >> It's just that it uncovered a bug in fetchLong64:ofObject:
> > >>
> > >> That cost me many hours, but that enforce my conviction about
> > signedness...
> > >> I would much much much prefer to call unsignedLong32At because it's
> > what I mean 9 times out of 10: get the magnitude...
> > >
> > >
> > > Let's add them.  It would be great to have them all consistent too.  But
> > at least let's add unsignedLong32At:[put:].  Wait.  In the simulator
> > longAt:[put:] et al are unsigned.  So we have three choices:
> > >
> > > a) live with it
> > > b) make longAt:[put:] long32At:put: et al unsigned in C, and add
> > signedLongAt:[put:] et al.
> > > c) make the simulator's longAt:[put:] long32At:put: signed and then add
> > unsignedLongAt:[put:] et al
> > >
> > > Any others?
> > >
> > > I think a) is unwise.  The simulator and C should agree.  Nicolas, I can
> > join your experience in finding that debugging these issues is extremely
> > time consuming.
> > >
> > > My preference is for b); some uses where the type should be signed will
> > by located by C compiler warnings; we hope that VM tests will catch the
> > others)
> >
> > Wouldn't (b) risk occasional cognitive dysfunction switching between
> > the semantic of aa C long being signed and a Slang long being
> > unsigned?  (c) seems better long term, but switching the signedness of
> > the an existing simulation method might cause short term instability
> > and need to be tackled in one sitting?   So a naive question... Why
> > leave one implicit?  Consider d) introducing both signedLongAt:[put:]
> > and unsignedLongAt:[put:] and gradual migration.
> >
> > cheers -ben
> >
> >
> My first thought was as Eliot, use b) because
> - the simulator is already b)
> - wild guess is that there are more unsigned than signed usage
> 
> But I like Ben arguments better.
> Non uniform naming has great costs
> - slow learning ramp for noobs
> - seed for future bug due to misunderstanding.
> 
> For example we already have
> * long implicitely meaning 32 bits like in signedIntToLong
> signedIntFromLong,
> * long unconditionnally meaning 64 bits like sqLong
> * long meaning either 32 or 64 bits depending on architecture in generated
> C (like in StackInterperter>>putLong:toFile:)
> * long meaning either 32 or 64 depending on the context
>   (#longAt: is allways 32 bits when sent to ByteArray Bitmap or CAccessor,
> but might be 64 bits when sent to simulator argh...)
> 
> I remember I had similar problems with word the first time I looked at VMMaker.
> WordArray being allways 32 bits word, but wordSize being 4 or 8 depending
> on target architecture.
> Even now, I'm not sure of which word we are talking of...

Me too. This is endlessly confusing no matter how long I look at it.
We need to clarify the terminology and the type declarations to distinguish
between references to slots in the object memory, object pointers that
coincidentally happen to fit into slots in the object memory, and references
to machine data types (C short/int/long/long long).

> 
> Also b) goes with a blitz strategy:
> - do a massive semantic change long32 -> unsigned
> - review the send sites and change a few variant to signed
> - repair broken C code once image start crashing and tests start failing
> - repair the simulator which might show different symptoms (see
> implementors of #asUnsignedLong)
> either it works soon, or never...
> 
> I've noted only 47 senders of long32At: .
> Plus the 33 senders of fetchLong32:ofObject:
> And cannot analyze easily the possible side effects thru Slang type
> inference...
> 
> As my recent bug of LargeInteger division showed: abuse of unsigned lead to
> expensive bug tracking too.
> That does a bit restrain my enthusiasm for solution b) - chat ??chaud??
> craint l'eau froide ;)
> 
> Last, I did not apply b) policy in my VM branch for 32 bits large int.
> Instead I introduced the unsigned variant gradually... (coward !)
> 
> Before we attempt anything, I'd like to remind the short term goals:
> - avoid uncontrolled sign extension when promoting to larger int in C
> - avoid undefined behavior, most of which being related to signed arithmetic
> - make the simulator better match C (or vice versa in the case)
> 
> The last goal is far away, there are subtle "differences"...
> Like in the simulator, the difference of two unsigned might be < 0.
> In C not, unless we force it (cast, type punning, whatever...).
> 
> Last point, gradual introduction and non-regression testing pre-suppose a
> stable VM.
> I still experiment random crashes I'd like to track first.
> 

I am still undecided between b) and c). Ben's approach makes the most
sense to me on the face of it, but I just did a quick test of the b)
approach and the resulting VM works fine, with BitBltTest still green.
I did this by modifying the MemoryAccess methods so that #long32At:
generates this (which will be inlined in the actual VM code):

   /*	#define long32At intAt */

   static unsigned int long32At(sqInt ptr) {
   	return (((unsigned int *) ((sqMemoryBase) + ptr)))[0];
   }

I am probably oversimplifying, and different compilers may treat these
things differently, but I see two conclusions that may be drawn from this:

1) Declaring #long32At: as either signed or unsigned is not really
important. What is important is the type declaration of the thing that
the result gets stored into. We are assuming twos compliment representations,
and a function that stores a "signed" value into some context that treats
it as unsigned will be an unsigned value, and vice versa.

2) We currently treat the result of #byteAt:, #shortAt:, #intAt:, and
#longAt: as signed values. There is no reason to change that convention.
But #long32At: (or #long128At: or whatever) could easily be changed at
this point to explicitly answer unsigned values. It does not look to
me like this would break anything, and it makes sense intuitively because
saying "long32At" implies that I want to retrieve a binary bit field
of size 32, with no presumption that it should be interpreted either as
twos compliment signed or as unsigned.

Dave