<div dir="ltr">Hi Ryan<div><br></div><div>   replying to the list because this may be of more general interest)<br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 5, 2015 at 6:01 PM, Ryan Macnak <span dir="ltr">&lt;<a href="mailto:rmacnak@gmail.com" target="_blank">rmacnak@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><p dir="ltr">Hi Eliot,</p>

<p dir="ltr">I&#39;ve been looking into a MIPSEL port of Cog. MIPS is about as simple an architecture as one could hope for, and supporting it would bring Cog up to par with the architectures supported by Android and Chrome OS (modulo the 64-bit support in flight). I&#39;ve managed to produce a function stack VM with only a few extra ifdefs and a stub Alien implementation.</p>

<p dir="ltr">A Cog port looks fairly straightforward, except for the RTL&#39;s assumption of condition flags. MIPS doesn&#39;t have condition flags, which means to efficiently translate tests and branches we need to consider the test and branch together, instead of independently as Cog is currently setup to do. E.g.,</p>

<p dir="ltr">CmpR: a R: b<br>

JumpZero: target<br>

=&gt;<br>

BEQ a, b, target</p>

<p dir="ltr">AddR: a R: b<br>

JumpOverflow: target<br>

=&gt;<br>

MOV t1, b<br>

ADDU b, a, b<br>

XOR t2, b, a<br>

XOR t1, b, t1<br>

AND t1, t1, t2<br>

BLTZ t1, target</p>

<p dir="ltr">One way to handle this is to remove compare, test and conditional branches from the RTL and add combined instructions like CmpR:R:JumpEqual: and AddR:R:JumpOverflow:. Another is to allow concretization to look at the preceding or following instruction. What do you think?</p></blockquote><div><br></div><div>I think the former is too much work.  I had assumed that the JIT takes advantage of the separation of comparison/arithmetic from conditional branch, but when I looked the other day I couldn&#39;t find an example.  So in principle it&#39;s a valid approach.</div><div><br></div><div>However, I expect that the latter is extremely simple.  The pattern I would investigate is to do the work in computeMaximumSize, adding support for accessing the previous instruction.  So e.g. rewrite</div><div><br></div><div><div>computeMaximumSizes</div><div><span style="white-space:pre-wrap">        </span>&quot;This pass assigns maximum sizes to all abstract instructions and eliminates jump fixups.</div><div><span style="white-space:pre-wrap">        </span> It hence assigns the maximum address an instruction will occur at which allows the next</div><div><span style="white-space:pre-wrap">        </span> pass to conservatively size jumps.&quot;</div><div><span style="white-space:pre-wrap">        </span>&lt;inline: false&gt;</div><div><span style="white-space:pre-wrap">        </span>&lt;var: #abstractInstruction type: #&#39;AbstractInstruction *&#39;&gt;</div><div><span style="white-space:pre-wrap">        </span>| relativeAddress |</div><div><span style="white-space:pre-wrap">        </span>literalsManager dumpLiterals: false.</div><div><span style="white-space:pre-wrap">        </span>relativeAddress := 0.</div><div><span style="white-space:pre-wrap">        </span>0 to: opcodeIndex - 1 do:</div><div><span style="white-space:pre-wrap">                </span>[:i| | abstractInstruction |</div><div><span style="white-space:pre-wrap">                </span>abstractInstruction := self abstractInstructionAt: i.</div><div><span style="white-space:pre-wrap">                </span>abstractInstruction</div><div><span style="white-space:pre-wrap">                        </span>address: relativeAddress;</div><div><span style="white-space:pre-wrap">                        </span>maxSize: abstractInstruction computeMaximumSize.</div><div><span style="white-space:pre-wrap">                </span>relativeAddress := relativeAddress + abstractInstruction maxSize]</div></div><div><br></div><div>to</div><div><br></div><div><div>computeMaximumSizes</div><div><span style="white-space:pre-wrap">        </span>&quot;This pass assigns maximum sizes to all abstract instructions and eliminates jump fixups.</div><div><span style="white-space:pre-wrap">        </span> It hence assigns the maximum address an instruction will occur at which allows the next</div><div><span style="white-space:pre-wrap">        </span> pass to conservatively size jumps.&quot;</div><div><span style="white-space:pre-wrap">        </span>&lt;inline: false&gt;</div><div><span style="white-space:pre-wrap">        </span>&lt;var: #abstractInstruction type: #&#39;AbstractInstruction *&#39;&gt;</div><div><span style="white-space:pre-wrap">        </span>| relativeAddress |</div><div><span style="white-space:pre-wrap">        </span>literalsManager dumpLiterals: false.</div><div><span style="white-space:pre-wrap">        </span>relativeAddress := 0.</div><div><span style="white-space:pre-wrap">        </span>0 to: opcodeIndex - 1 do:</div><div><span style="white-space:pre-wrap">                </span>[:i| | abstractInstruction |</div><div><span style="white-space:pre-wrap">                </span>abstractInstruction := self abstractInstructionAt: i.</div><div><span style="white-space:pre-wrap">                </span>abstractInstruction</div><div><span style="white-space:pre-wrap">                        </span>address: relativeAddress;</div><div><span style="white-space:pre-wrap">&gt;&gt;                        </span>maxSize: (abstractInstruction computeMaximumSizeAt: i).</div><div><span style="white-space:pre-wrap">                </span>relativeAddress := relativeAddress + abstractInstruction maxSize]</div></div><div><br></div><div>and have the CogMIPSELCompiler&#39;s computeMaximumSIzeAt: search back for the associated compare instruction, absorb it into the current (and possibly following branch), and collapse the comparison to a label.  There are sequences like this in 32-bit Spur&#39;s genInnerPrimitiveAtPut: that require absorbing the comparison into more than one conditional:</div><div><br></div><div><div><span style="white-space:pre-wrap">        </span>cogit CmpCq: objectMemory arrayFormat R: formatReg.</div><div><span style="white-space:pre-wrap">        </span>jumpNotIndexablePointers := cogit JumpBelow: 0.</div><div><span style="white-space:pre-wrap">        </span>jumpHasFixedFields := cogit JumpNonZero: 0.</div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<p dir="ltr">Also, how do branch offsets get fixed if the actual size of an RTL instruction is smaller than what is answered by computeMaximumSize?</p></blockquote><div><br></div><div>The idea is that branch distances are computed when the maximum sizes are known, and that if a branch is long when jumping across the maximum distance it must remain so when later generated, even if intervening instructions are small enough to render the distance small.  So...</div><div><br></div><div><div style="color:rgb(0,0,0);font-size:14px"><span class="">Branch</span> offsets are fixed by the following algorithm in Cogit&gt;&gt;computeMaximumSizes (1.), Cogit&gt;&gt;generateInstructionsAt: (2. 3.) &amp; CogAbstractInstruction&gt;&gt;#sizePCDependentInstructionAt: (2.).</div><div style="color:rgb(0,0,0);font-size:14px"><br></div><div style="color:rgb(0,0,0);font-size:14px">1. Make a pass over the sequence of instructions assigning their maximum size.  This will be the same as the actual size for all non-pc-dependent instructions. Assign addresses to instructions based on this maximum size</div><div style="color:rgb(0,0,0);font-size:14px">2. Make <span style="background-color:rgba(255,255,255,0)">a pass over the sequence of instructions computing actual sizes of pc-dependent instructions (jumps, pc-relative data references) based on these max sizes.  If a <span class="">branch</span> distance is small  jumping across maximum sizes it will remain small when jumping across actual sizes.</span></div><div style="color:rgb(0,0,0);font-size:14px">3. Make a final pass that generates machine code (to the memory inside each instruction) that computes actual size.  For the <span style="background-color:rgba(255,255,255,0)">pc-dependent instructions, if they needed large offsets in pass 2 they use large offsets even if only a small offset is needed jumping over actual sizes.  This keeps the offset computation stable and hence these three passes are sufficient.</span></div></div></div><br><div><div dir="ltr"><div><span style="font-size:small;border-collapse:separate"><div>_,,,^..^,,,_<br></div><div>best, Eliot</div></span></div></div></div>

</div></div></div>