<br><br><div class="gmail_quote">On Sun, Feb 22, 2009 at 10:37 AM,  <span dir="ltr">&lt;<a href="mailto:bryce@kampjes.demon.co.uk">bryce@kampjes.demon.co.uk</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div class="Ih2E3d"><br>

Eliot Miranda writes:<br>

&nbsp;&gt;<br>

&nbsp;&gt; But what I really think is that this is too low a level to worry about.<br>

&nbsp;&gt; &nbsp;Much more important to focus on<br>

&nbsp;&gt; - context to stack mapping<br>

&nbsp;&gt; - in-line cacheing via a JIT<br>

&nbsp;&gt; - exploiting multicore via Hydra<br>

&nbsp;&gt; and beyond (e.g. speculative inlining)<br>

&nbsp;&gt; than worrying about tiny micro-optimizations like this :)<br>

<br>

</div>If you&#39;re planning on adding speculative, I assume Self style dynamic,<br>

inlining won&#39;t that reduce the value of context to stack mapping?</blockquote><div><br></div><div>Not at all; in fact quite the reverse. &nbsp;Context to stack mapping allows one to retain contexts while having the VM execute efficient, stack-based code (i.e. using hardware call instructions). &nbsp;This in turn enables the entire adaptive optimizer, including the stack analyser and the bytecode-to-bytecode compiler/method inliner to be written in Smalltalk. &nbsp;The image level code can examine the run-time stack using contexts as their interface without having to understand native stack formats or different ISAs. &nbsp;The optimizer is therefore completely portable with all machine specificities confined to the underlying VM which is much simpler by virtue of not containing a sophisticated optimizer (which one would have to squeeze through Slang etc).</div>

<div><br></div><div>So for me, context-to-stack mapping is fundamental to implementing speculative inlining in Smalltalk.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

My view with Exupery is context caches should be left until after<br>

dynamic inlining as their value will depend on how well dynamic<br>

inlining reduces the number of sends.<br></blockquote><div><br></div><div>I know and I disagree. &nbsp;Dynamic inlining depends on collecting good type information, something that inline caches do well. &nbsp;In-line caches are efficiently implemented with native call instructions, either to method entry-points or PIC jump tables. &nbsp;Native call instructions mesh well with stacks. &nbsp;So context-to-stack mapping, for me, is a sensible enabling optimization for speculative inlining because it meshes well with inline caches.</div>

<div><br></div><div>Further,&nbsp;context-to-stack mapping is such a huge win that it&#39;ll be of benefit even if the VM is spending 90% of its time in inlined call-less code. &nbsp;We see a speedup of very nearly 2x (48% sticks in my head) for one non-micro tree walking benchmark from the computer language shootout. &nbsp;And this is in a very slow VM. &nbsp;In a faster VM context-to-stack mapping would be even more valuable, because it would save an even greater percentage of overall execution time.</div>

<div><br></div><div>Further still using call &amp; return instructions as conventionally as possible meshes extremely well with current processor implementations which, because of the extensive use thereon of conventional stack-oriented language implementations, have done a great job optimizing call/return.</div>

<div><br></div><div>Further still, the current performance of call/return on contemporary processors, specifically prefetch across call &amp; return (prefetch across return only possible if one sticks to the processor&#39;s expected stack organization of return addresses) renders call/return performance the same as jumps. &nbsp;So the benefits of inlining are no longer in eliminating call/return, but rather in eliminating dispatch, argument copying, etc. &nbsp;So inlining per se isn&#39;t of benefit. &nbsp;It can actually worsen instruction cache density. Analysis and elimination of dispatch is. &nbsp;So again context-to-stack mapping makes sense because it means the speculative inliner/adaptive optimizer doesn&#39;t have to focus on creating humongous methods or inlining accessos etc etc, and can focus on higher level optimizations like block removal (lambda lifting?), common subexpression elimination, and so on.&nbsp;</div>

<div><br></div><div>best</div><div>Eliot<br>&nbsp;</div></div><br>