On Tue, Jun 19, 2012 at 12:19 AM, Colin Putney <colin@wiresong.com> wrote:
So I was poking around in Compiler today, and noticed that it's a
bit... messy. I had a few ideas for improvement, but before I go
monkeying with such an important part of the system, I thought I'd
bring it up here. Do we have a long- or medium-term plan for how the
compiler should evolve?

As I see it there are a few paths available to us:

1. Incremental improvement. The compiler we have now is tried and
true. Now that we have proper block closures and a high performance
VM, there's no real need to improve bytecode generation, so we
shouldn't put much effort into this part of the system. We'll just
make small improvements and minor refactorings as needed.

I wouldn't say that the bytecode set is satisfactory.  Some things that are wrong are
- limited to 256 literals
- limited to 32 temporaries, 16 arguments (IIRC)
- limited room for expansion (and in Newspeak there is no room for expansion)

I'd like to see a set with
- a primitive bytecode to move the primitive number field out of the header and make more room for num args, num literals (64k would be a fine limit) etc.
- a design based around prefixes for extended indices a la VW.  lifts the limits on addressability, arity etc while keeping the bytecode compact
- some mild experiments such as a nop which can be used e.g. to express metadata that guides the decompiler (this is a to:do:, etc).

Even more interesting would be metadata that allowed the discovery of inlined blocks so that e.g. mustBeBoolean is instead handled by dynamically creating closures and the relevant ifTrue:ifFalse: message so that these can be inlined for true/false but reimplemented for other objects.
 
2. Adopt an existing project. There have been a few "new compiler"
projects over the years, and one or another of them might present an
opportunity for signifiant improvement over the status quo. I'm
thinking of ByteSurgeon, Opal, AOStA etc. It's not something we'll
rush into, but eventually, when the code is mature, we'll want to
replace the current compiler.

3. Something completely new. Now that we have closures and a fast VM,
existing projects aren't relevant anymore, but we have new
opportunities for improvement. VM-level changes, such as a new object
format or new bytecodes could drive this option, if they're big enough
that significant work on the compiler is required anyway. Maybe we can
only see the broad outlines of what the project might look like at the
moment, but we can see it on the horizon.

Well, my refactoring of the compiler to move instruction encoding out of ParseNode general instances and into BytecodeEncoder takes the pressure off as far as changing the bytecode set.  There's still a need for refactoring in InstructionStream and CompiledMethod to handle bytecode set change.  It is really a BytecodeEncoder or InstructionStream that understands how a bytecode set works, and not CompiledMethod (in e.g. readsField etc).

So, are there any pain points right now that we should think about
addressing? Is anybody planning or considering working on something
compiler-related?

For me at least has to take second place to the new object representation because there's much more benefit to be derived from the object representation.

Eliot, is there anything in the new object format that will have an
impact on image-side compilation?

I don't think so.  It should be entirely orthogonal.
 
I seem to remember you mentioning
something about efficiently supporting alternate bytecode sets. Is
that meant for Newspeak, or do you have something in mind for
Smalltalk?

It is a convenient way of migrating the bytecode set.  Better than my EncoderForLongFormV3 approach.

I don't think we have to come up with a definitive plan just now, I
just want to get a sense of what people are thinking.

Colin

--
cheers,
Eliot