[ANN] Chuck type inferencer

Tue Jun 1 14:07:08 UTC 2004

On May 28, 2004, at 4:46 PM, Lex Spoon wrote:

> Tim Rowledge <tim at sumeru.stanford.edu> wrote:
>> "Lex Spoon" <lex at cc.gatech.edu> wrote:
>>
>>> The major memory requirement, it turns out, is that AutoChuck is 
>>> keeping
>>> around parse trees for every method.  So I'm trying to eithre get 
>>> rid of
>>> that requirement or make the parse trees take less memory.
>> It might be useful to recall one of Eliot's battle cries -
>> "CompiledMethods are flattened parsetrees"

That's an interesting position given his work on the AOStA bytecode 
optimizer. My guess would be that the more you optimize the bytecodes, 
the less they look like the parse tree that created them.

> It is very tempting to try and use the compiled methods themselves as
> the compressed parse trees.  That would mean there is zero overhead 
> from
> "keeping around parse trees".  However, I don't know if I will 
> implement
> this idea myself.   There is one very annoying issue: the decompiler
> does not always produce *exactly* the parse tree that was compiled to
> begin with.

This is something I've been working on addressing for the next version 
of OmniBrowser. Essentially, the next version of OB will operate on an 
AST that begins with Package at the top and extends down through Class 
and Method to sub-method parse nodes like MessageSend and Literal.

One of the goals with this is to get consistency between different 
representations of the code, and so the AST is designed to be as 
lossless as possible when converting back and forth between 
representations. The SourcePrinter for example, is essentially a null 
pretty-printer; it reconstructs the source code almost exactly the way 
the programmer entered it. (And this, incidentally, is the reason I 
didn't just reuse the RB parse nodes. They don't preserve enough 
formatting information.)

Some things I plan to do with this:

- Hook it up to the closure compiler and generate bytecodes from the AST

- Adapt the closure decompiler to generate an AST from bytecodes 
(exactly)

- Create a binary serialization format. There is some interesting 
literature on compressing ASTs. Essentially the abstract grammar is 
used as a statistical model for an arithmetic encoder. This gives 
better compression than, for example, gzipped bytecodes. This could be 
a very nice way to distribute code - more compact and faster to load 
than changesets.

- Enhance Monticello. This representation of code is much richer than 
Monticello's current list of definitions, and let us implement features 
that are currently difficult or impossible. Tree-diffing will be an 
important part of this, and might allow us to implement RB-like 
features in Monticello. (For example, you could compare two versions of 
a package and get a diff expressed in terms of refactorings: "Method A 
extracted from B, Method X pushed up to class Y.")

- Allow structured annotations and additions to the AST. This could be 
used for all sorts of things. Stéphane's idea of attaching type 
information, for example, might be interesting in relation to Chuck.

So this could be useful for Chuck in terms of either decompilation from 
byte codes or AST compression.

Going in the other direction, I'm interested in exploring ways that 
tools like OmniBrowser and Monticello can take advantage of the type 
information provided by Chuck. The thing that leaps immediately to mind 
is inter-package dependencies. I bet it would be feasible to deduce the 
minimal set of packages required to run, say, the unit tests for a 
package. (And the minimal code from the "as yet unpackaged" base image 
too!)

I'm also imagining a panel in OB for static type analysis. It could be 
a little like the test runner - buttons for analysing the code in 
various scopes - method, class, package etc and a list for displaying 
errors. Click on an error and the brower jumps to the relevant method 
and highlights the dangerous send. Or how about enhancing SLint with 
type analysis?

All in all, I'm pretty excited by the possibilities that Chuck opens 
up. Thanks Lex!

Colin