<br><br><div class="gmail_quote">On Tue, Jun 19, 2012 at 12:19 AM, Colin Putney <span dir="ltr"><<a href="mailto:colin@wiresong.com" target="_blank">colin@wiresong.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
So I was poking around in Compiler today, and noticed that it's a<br>
bit... messy. I had a few ideas for improvement, but before I go<br>
monkeying with such an important part of the system, I thought I'd<br>
bring it up here. Do we have a long- or medium-term plan for how the<br>
compiler should evolve?<br>
<br>
As I see it there are a few paths available to us:<br>
<br>
1. Incremental improvement. The compiler we have now is tried and<br>
true. Now that we have proper block closures and a high performance<br>
VM, there's no real need to improve bytecode generation, so we<br>
shouldn't put much effort into this part of the system. We'll just<br>
make small improvements and minor refactorings as needed.<br></blockquote><div><br></div><div>I wouldn't say that the bytecode set is satisfactory. Some things that are wrong are</div><div>- limited to 256 literals</div>
<div>- limited to 32 temporaries, 16 arguments (IIRC)</div><div>- limited room for expansion (and in Newspeak there is no room for expansion)</div><div><br></div><div>I'd like to see a set with</div><div>- a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.</div>
<div>- a design based around prefixes for extended indices a la VW. lifts the limits on addressability, arity etc while keeping the bytecode compact</div><div>- some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).</div>
<div><br></div><div>Even more interesting would be metadata that allowed the discovery of inlined blocks so that e.g. mustBeBoolean is instead handled by dynamically creating closures and the relevant ifTrue:ifFalse: message so that these can be inlined for true/false but reimplemented for other objects.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">2. Adopt an existing project. There have been a few "new compiler"<br>
projects over the years, and one or another of them might present an<br>
opportunity for signifiant improvement over the status quo. I'm<br>
thinking of ByteSurgeon, Opal, AOStA etc. It's not something we'll<br>
rush into, but eventually, when the code is mature, we'll want to<br>
replace the current compiler.<br>
<br>
3. Something completely new. Now that we have closures and a fast VM,<br>
existing projects aren't relevant anymore, but we have new<br>
opportunities for improvement. VM-level changes, such as a new object<br>
format or new bytecodes could drive this option, if they're big enough<br>
that significant work on the compiler is required anyway. Maybe we can<br>
only see the broad outlines of what the project might look like at the<br>
moment, but we can see it on the horizon.<br></blockquote><div><br></div><div>Well, my refactoring of the compiler to move instruction encoding out of ParseNode general instances and into BytecodeEncoder takes the pressure off as far as changing the bytecode set. There's still a need for refactoring in InstructionStream and CompiledMethod to handle bytecode set change. It is really a BytecodeEncoder or InstructionStream that understands how a bytecode set works, and not CompiledMethod (in e.g. readsField etc).</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">So, are there any pain points right now that we should think about<br>
addressing? Is anybody planning or considering working on something<br>
compiler-related?<br></blockquote><div><br></div><div>For me at least has to take second place to the new object representation because there's much more benefit to be derived from the object representation.</div><div>
<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Eliot, is there anything in the new object format that will have an<br>
impact on image-side compilation?</blockquote><div><br></div><div>I don't think so. It should be entirely orthogonal.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I seem to remember you mentioning<br>
something about efficiently supporting alternate bytecode sets. Is<br>
that meant for Newspeak, or do you have something in mind for<br>
Smalltalk?<br></blockquote><div><br></div><div>It is a convenient way of migrating the bytecode set. Better than my EncoderForLongFormV3 approach.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I don't think we have to come up with a definitive plan just now, I<br>
just want to get a sense of what people are thinking.<br>
<span class="HOEnZb"><font color="#888888"><br>
Colin<br></font></span></blockquote></div><div><br></div>-- <br>cheers,<div>Eliot</div><br>