<br><br><div class="gmail_quote">On Tue, Jun 19, 2012 at 12:19 AM, Colin Putney <span dir="ltr">&lt;<a href="mailto:colin@wiresong.com" target="_blank">colin@wiresong.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

So I was poking around in Compiler today, and noticed that it&#39;s a<br>

bit... messy. I had a few ideas for improvement, but before I go<br>

monkeying with such an important part of the system, I thought I&#39;d<br>

bring it up here. Do we have a long- or medium-term plan for how the<br>

compiler should evolve?<br>

<br>

As I see it there are a few paths available to us:<br>

<br>

1. Incremental improvement. The compiler we have now is tried and<br>

true. Now that we have proper block closures and a high performance<br>

VM, there&#39;s no real need to improve bytecode generation, so we<br>

shouldn&#39;t put much effort into this part of the system. We&#39;ll just<br>

make small improvements and minor refactorings as needed.<br></blockquote><div><br></div><div>I wouldn&#39;t say that the bytecode set is satisfactory.  Some things that are wrong are</div><div>- limited to 256 literals</div>

<div>- limited to 32 temporaries, 16 arguments (IIRC)</div><div>- limited room for expansion (and in Newspeak there is no room for expansion)</div><div><br></div><div>I&#39;d like to see a set with</div><div>- a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.</div>

<div>- a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact</div><div>- some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).</div>

<div><br></div><div>Even more interesting would be metadata that allowed the discovery of inlined blocks so that e.g. mustBeBoolean is instead handled by dynamically creating closures and the relevant ifTrue:ifFalse: message so that these can be inlined for true/false but reimplemented for other objects.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">2. Adopt an existing project. There have been a few &quot;new compiler&quot;<br>

projects over the years, and one or another of them might present an<br>

opportunity for signifiant improvement over the status quo. I&#39;m<br>

thinking of ByteSurgeon, Opal, AOStA etc. It&#39;s not something we&#39;ll<br>

rush into, but eventually, when the code is mature, we&#39;ll want to<br>

replace the current compiler.<br>

<br>

3. Something completely new. Now that we have closures and a fast VM,<br>

existing projects aren&#39;t relevant anymore, but we have new<br>

opportunities for improvement. VM-level changes, such as a new object<br>

format or new bytecodes could drive this option, if they&#39;re big enough<br>

that significant work on the compiler is required anyway. Maybe we can<br>

only see the broad outlines of what the project might look like at the<br>

moment, but we can see it on the horizon.<br></blockquote><div><br></div><div>Well, my refactoring of the compiler to move instruction encoding out of ParseNode general instances and into BytecodeEncoder takes the pressure off as far as changing the bytecode set.  There&#39;s still a need for refactoring in InstructionStream and CompiledMethod to handle bytecode set change.  It is really a BytecodeEncoder or InstructionStream that understands how a bytecode set works, and not CompiledMethod (in e.g. readsField etc).</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So, are there any pain points right now that we should think about<br>

addressing? Is anybody planning or considering working on something<br>

compiler-related?<br></blockquote><div><br></div><div>For me at least has to take second place to the new object representation because there&#39;s much more benefit to be derived from the object representation.</div><div>

<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Eliot, is there anything in the new object format that will have an<br>

impact on image-side compilation?</blockquote><div><br></div><div>I don&#39;t think so.  It should be entirely orthogonal.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

 I seem to remember you mentioning<br>

something about efficiently supporting alternate bytecode sets. Is<br>

that meant for Newspeak, or do you have something in mind for<br>

Smalltalk?<br></blockquote><div><br></div><div>It is a convenient way of migrating the bytecode set.  Better than my EncoderForLongFormV3 approach.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I don&#39;t think we have to come up with a definitive plan just now, I<br>

just want to get a sense of what people are thinking.<br>

<span class="HOEnZb"><font color="#888888"><br>

Colin<br></font></span></blockquote></div><div><br></div>-- <br>cheers,<div>Eliot</div><br>