<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2016-03-31 2:41 GMT+02:00 David T. Lewis <span dir="ltr">&lt;<a href="mailto:lewis@mail.msen.com" target="_blank">lewis@mail.msen.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

Changing the subject line to reflect current discussion topic.<br>

<br>

On Wed, Mar 30, 2016 at 10:46:21PM +0200, Nicolas Cellier wrote:<br>

&gt;<br>

&gt; 2016-03-30 13:46 GMT+02:00 Ben Coman &lt;<a href="mailto:btc@openinworld.com">btc@openinworld.com</a>&gt;:<br>

&gt;<br>

&gt; &gt; On Wed, Mar 30, 2016 at 2:09 PM, Eliot Miranda &lt;<a href="mailto:eliot.miranda@gmail.com">eliot.miranda@gmail.com</a>&gt; wrote:<br>

&gt; &gt;<br>

&gt; &gt; &gt; Hi Nicolas,<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; On Tue, Mar 29, 2016 at 2:36 PM, Nicolas Cellier &lt;<a href="mailto:nicolas.cellier.aka.nice@gmail.com">nicolas.cellier.aka.nice@gmail.com</a>&gt; wrote:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;&gt; 2016-03-27 15:39 GMT+02:00 Nicolas Cellier &lt;<br>

&gt; &gt; <a href="mailto:nicolas.cellier.aka.nice@gmail.com">nicolas.cellier.aka.nice@gmail.com</a>&gt;:<br>

&gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt; To be more specific,<br>

&gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt; The first thing to do would be to confirm that the KernelNumber tests<br>

&gt; &gt; fail with a squeak.stack.v3 VM compiled with head revision of COG VM<br>

&gt; &gt; branch, whatever OS.<br>

&gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt; Then knowing from which SVN version of COG VM branch the KernelNumber<br>

&gt; &gt; the tests start failing would be nice.<br>

&gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt; The bisect job is to:<br>

&gt; &gt; &gt;&gt;&gt; - iterate on version number (whatever strategy, bsearch or something)<br>

&gt; &gt; &gt;&gt;&gt; - checkout VM sources<br>

&gt; &gt; &gt;&gt;&gt; - compile the build.favouriteOS/squeak.stack.v3<br>

&gt; &gt; &gt;&gt;&gt; - run a v3 image with the generated VM and launch KernelNumber tests<br>

&gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;&gt; Really a job for a jenkins bot, travis bot or whatever...<br>

&gt; &gt; &gt;&gt;&gt; The next good thing would be to give a little love to <a href="http://build.squeak.org" rel="noreferrer" target="_blank">build.squeak.org</a><br>

&gt; &gt; or anyother similar solution.<br>

&gt; &gt; &gt;&gt;&gt; I only see red disks on this site...<br>

&gt; &gt; &gt;&gt;&gt;<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; Follow up: I did bissect manually:<br>

&gt; &gt; &gt;&gt; 3648 (VMMaker.oscog-eem.nice.1729) OK<br>

&gt; &gt; &gt;&gt; 3649 (VMMaker.oscog-eem.1740) Not OK<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; So something went wrong for V3 between these two versions.<br>

&gt; &gt; &gt;&gt; At the same time, it works for spur.<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; Spur objects are 8 bytes aligned, v3 objects are 4 bytes aligned...<br>

&gt; &gt; &gt;&gt; So fetching 64 bits from V3 MUST be decomposed into 2 32 bits fetch to<br>

&gt; &gt; avoid a segfault related to misalign fetch.<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; OK, let&#39;s look how it is performed:<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; fetchLong64: longIndex ofObject: oop<br>

&gt; &gt; &gt;&gt;     &lt;returnTypeC: #sqLong&gt;<br>

&gt; &gt; &gt;&gt;     ^self cppIf: BytesPerWord = 8<br>

&gt; &gt; &gt;&gt;         ifTrue: [self long64At: oop + self baseHeaderSize + (longIndex<br>

&gt; &gt; &lt;&lt; 3)]<br>

&gt; &gt; &gt;&gt;         ifFalse:<br>

&gt; &gt; &gt;&gt;             [self cppIf: VMBIGENDIAN<br>

&gt; &gt; &gt;&gt;                 ifTrue: [((self long32At: oop + self baseHeaderSize +<br>

&gt; &gt; (longIndex &lt;&lt; 3)) asUnsignedLongLong &lt;&lt; 32)<br>

&gt; &gt; &gt;&gt;                     + (self long32At: oop + self baseHeaderSize +<br>

&gt; &gt; (longIndex &lt;&lt; 3 + 4))]<br>

&gt; &gt; &gt;&gt;                 ifFalse: [(self long32At: oop + self baseHeaderSize +<br>

&gt; &gt; (longIndex &lt;&lt; 3))<br>

&gt; &gt; &gt;&gt;                     + ((self long32At: oop + self baseHeaderSize +<br>

&gt; &gt; (longIndex &lt;&lt; 3 + 4)) asUnsignedLongLong &lt;&lt; 32)]]<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; AH AH! Operation is:<br>

&gt; &gt; &gt;&gt;     low + (((unsigned) high) &lt;&lt; 32)<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; With low declared as SIGNED (long32At is signed GRRR).<br>

&gt; &gt; &gt;&gt; What if bit 32 of low is 1? then the compiler will perform:<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; (unsigned long) low + ...<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; AND THIS WILL DO A SIGN EXTENSION...<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; It&#39;s not that my code was false...<br>

&gt; &gt; &gt;&gt; It&#39;s just that it uncovered a bug in fetchLong64:ofObject:<br>

&gt; &gt; &gt;&gt;<br>

&gt; &gt; &gt;&gt; That cost me many hours, but that enforce my conviction about<br>

&gt; &gt; signedness...<br>

&gt; &gt; &gt;&gt; I would much much much prefer to call unsignedLong32At because it&#39;s<br>

&gt; &gt; what I mean 9 times out of 10: get the magnitude...<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Let&#39;s add them.  It would be great to have them all consistent too.  But<br>

&gt; &gt; at least let&#39;s add unsignedLong32At:[put:].  Wait.  In the simulator<br>

&gt; &gt; longAt:[put:] et al are unsigned.  So we have three choices:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; a) live with it<br>

&gt; &gt; &gt; b) make longAt:[put:] long32At:put: et al unsigned in C, and add<br>

&gt; &gt; signedLongAt:[put:] et al.<br>

&gt; &gt; &gt; c) make the simulator&#39;s longAt:[put:] long32At:put: signed and then add<br>

&gt; &gt; unsignedLongAt:[put:] et al<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Any others?<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; I think a) is unwise.  The simulator and C should agree.  Nicolas, I can<br>

&gt; &gt; join your experience in finding that debugging these issues is extremely<br>

&gt; &gt; time consuming.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; My preference is for b); some uses where the type should be signed will<br>

&gt; &gt; by located by C compiler warnings; we hope that VM tests will catch the<br>

&gt; &gt; others)<br>

&gt; &gt;<br>

&gt; &gt; Wouldn&#39;t (b) risk occasional cognitive dysfunction switching between<br>

&gt; &gt; the semantic of aa C long being signed and a Slang long being<br>

&gt; &gt; unsigned?  (c) seems better long term, but switching the signedness of<br>

&gt; &gt; the an existing simulation method might cause short term instability<br>

&gt; &gt; and need to be tackled in one sitting?   So a naive question... Why<br>

&gt; &gt; leave one implicit?  Consider d) introducing both signedLongAt:[put:]<br>

&gt; &gt; and unsignedLongAt:[put:] and gradual migration.<br>

&gt; &gt;<br>

&gt; &gt; cheers -ben<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; My first thought was as Eliot, use b) because<br>

&gt; - the simulator is already b)<br>

&gt; - wild guess is that there are more unsigned than signed usage<br>

&gt;<br>

&gt; But I like Ben arguments better.<br>

&gt; Non uniform naming has great costs<br>

&gt; - slow learning ramp for noobs<br>

&gt; - seed for future bug due to misunderstanding.<br>

&gt;<br>

&gt; For example we already have<br>

&gt; * long implicitely meaning 32 bits like in signedIntToLong<br>

&gt; signedIntFromLong,<br>

&gt; * long unconditionnally meaning 64 bits like sqLong<br>

&gt; * long meaning either 32 or 64 bits depending on architecture in generated<br>

&gt; C (like in StackInterperter&gt;&gt;putLong:toFile:)<br>

&gt; * long meaning either 32 or 64 depending on the context<br>

&gt;   (#longAt: is allways 32 bits when sent to ByteArray Bitmap or CAccessor,<br>

&gt; but might be 64 bits when sent to simulator argh...)<br>

&gt;<br>

&gt; I remember I had similar problems with word the first time I looked at VMMaker.<br>

&gt; WordArray being allways 32 bits word, but wordSize being 4 or 8 depending<br>

&gt; on target architecture.<br>

&gt; Even now, I&#39;m not sure of which word we are talking of...<br>

<br>

Me too. This is endlessly confusing no matter how long I look at it.<br>

We need to clarify the terminology and the type declarations to distinguish<br>

between references to slots in the object memory, object pointers that<br>

coincidentally happen to fit into slots in the object memory, and references<br>

to machine data types (C short/int/long/long long).<br>

<br>

&gt;<br>

&gt; Also b) goes with a blitz strategy:<br>

&gt; - do a massive semantic change long32 -&gt; unsigned<br>

&gt; - review the send sites and change a few variant to signed<br>

&gt; - repair broken C code once image start crashing and tests start failing<br>

&gt; - repair the simulator which might show different symptoms (see<br>

&gt; implementors of #asUnsignedLong)<br>

&gt; either it works soon, or never...<br>

&gt;<br>

&gt; I&#39;ve noted only 47 senders of long32At: .<br>

&gt; Plus the 33 senders of fetchLong32:ofObject:<br>

&gt; And cannot analyze easily the possible side effects thru Slang type<br>

&gt; inference...<br>

&gt;<br>

&gt; As my recent bug of LargeInteger division showed: abuse of unsigned lead to<br>

&gt; expensive bug tracking too.<br>

&gt; That does a bit restrain my enthusiasm for solution b) - chat ??chaud??<br>

&gt; craint l&#39;eau froide ;)<br>

&gt;<br>

&gt; Last, I did not apply b) policy in my VM branch for 32 bits large int.<br>

&gt; Instead I introduced the unsigned variant gradually... (coward !)<br>

&gt;<br>

&gt; Before we attempt anything, I&#39;d like to remind the short term goals:<br>

&gt; - avoid uncontrolled sign extension when promoting to larger int in C<br>

&gt; - avoid undefined behavior, most of which being related to signed arithmetic<br>

&gt; - make the simulator better match C (or vice versa in the case)<br>

&gt;<br>

&gt; The last goal is far away, there are subtle &quot;differences&quot;...<br>

&gt; Like in the simulator, the difference of two unsigned might be &lt; 0.<br>

&gt; In C not, unless we force it (cast, type punning, whatever...).<br>

&gt;<br>

&gt; Last point, gradual introduction and non-regression testing pre-suppose a<br>

&gt; stable VM.<br>

&gt; I still experiment random crashes I&#39;d like to track first.<br>

&gt;<br>

<br>

I am still undecided between b) and c). Ben&#39;s approach makes the most<br>

sense to me on the face of it, but I just did a quick test of the b)<br>

approach and the resulting VM works fine, with BitBltTest still green.<br>

I did this by modifying the MemoryAccess methods so that #long32At:<br>

generates this (which will be inlined in the actual VM code):<br>

<br>

   /*   #define long32At intAt */<br>

<br>

   static unsigned int long32At(sqInt ptr) {<br>

        return (((unsigned int *) ((sqMemoryBase) + ptr)))[0];<br>

   }<br>

<br>

<br>

I am probably oversimplifying, and different compilers may treat these<br>

things differently, but I see two conclusions that may be drawn from this:<br>

<br>

1) Declaring #long32At: as either signed or unsigned is not really<br>

important. What is important is the type declaration of the thing that<br>

the result gets stored into. We are assuming twos compliment representations,<br>

and a function that stores a &quot;signed&quot; value into some context that treats<br>

it as unsigned will be an unsigned value, and vice versa.<br>

<br>

2) We currently treat the result of #byteAt:, #shortAt:, #intAt:, and<br>

#longAt: as signed values. There is no reason to change that convention.<br>

But #long32At: (or #long128At: or whatever) could easily be changed at<br>

this point to explicitly answer unsigned values. It does not look to<br>

me like this would break anything, and it makes sense intuitively because<br>

saying &quot;long32At&quot; implies that I want to retrieve a binary bit field<br>

of size 32, with no presumption that it should be interpreted either as<br>

twos compliment signed or as unsigned.<br>

<br>

Dave<br>

<br></blockquote><div><br></div><div>It&#39;s bits as long as you don&#39;t use it in an expression<br></div><div>We have to decide about sign as soon as we write:<br></div><div>   low | ((unsigned long long) high &lt;&lt; 32)<br></div><div>Note that I used | instead of + to better show my intention to operate on bits...<br></div><div>But when the compiler has to extend 32bits low to 64bits, it has to know how to handle sign bit.<br></div><div>In this example we need to explicitely declare things unsigned...<br><br></div><div>But there is even more trickyness in C:<br>declaring all variable as unsigned still does not protect us from sign extension<br></div><div>Here is a small snippet to illustrate it<br><br>$ cat &gt;test.c &lt;&lt;END<br>#include &lt;stdio.h&gt;<br>int main() {<br>   unsigned short x=0xFF00;<br>   unsigned long long y = x &lt;&lt; 16;<br>   printf(&quot;y=%llx\n&quot;,y);<br>   return 0;<br>}<br>END<br><br></div><div>$ gcc tests.c<br></div><div>$ ./a.exe<br>y=ffffffffff000000<br><br></div><div>Surprising?<br></div><div>No, the unsigned short gets promoted to int (signed int) in the expression x &lt;&lt; 16<br></div><div>So we overflow the sign bit (UB) and then promote 32bit signed to 64bits...<br></div><div> <br></div></div><br></div></div>