About the new compiler, Part II

Sat Jan 19 14:51:23 UTC 2008

Part II: Examples
-----------------------

I was made aware that the first part put far to much emphasis on  
performance of compilation. I think
it's true that it's fairly unimportant, I did that because in an  
ealier thread we got a comment that
suggested that the compiler has to be bad just because of that. This  
is of course completely not the
case, performance of compilation is not at all that important...  
especially considering today's machines.
And both SmaCC nor the rest of the Compiler were ever optimized for  
speed of compilation...

What counts is flexibility and good design, so that the code is  
maintainable, reusable and enables
experiments. So I hope this part emphasis the right points better.

Having a framework with the right abstractions simplifies  
everything... "Modeling is cheating".

So the question was if the modular design with all these visitors, the  
use of SmaCC instead
of a hand-written parser and the IR at the end really are that  
interesting to have... I think they
are, but the only way to prove it is to explain a bit what we used the  
framework for in the past.
For all these, the architecture proved to be quite useful.

All the things mentioned in the following are *not* part of the  
NewCompiler. They have been
build using it, and while building them, we fixed bugs and generalized  
the framework a little.

So what did we do with the NewCompiler Framework?

1) Language experiments. Some time ago, I did a small experiment for  
Impara with different syntax
    for Squeak. The stated goal was to see  how little is needed for  
having python or JS like syntax
    in Squeak.
    (Of course, the result was that just having the syntax is not  
enough: It's completely unclear
    where the similarity to the other language brakes down, and thus  
it's unusable.. people want to
    pick a book and just type in the code without even undestanding it  
completely... the semantics
    are where it starts to get interesting and *a lot* of work).

    But it's a cool demo... and easy to do: Grammar of JS in SmaCC,  
AST Nodes for all constructs, then
    a visitor that calls the IRBuilder to generate code. No dealing  
with bytecode, but nevertheless
    the complete freedom of bytecode abstraction level code generation.

    Slides (Squeak image with all code): http://www.iam.unibe.ch/~denker/talks/BabelTalk.zip

2) Bytecode Transformation.

    I got interested in Behavioral Reflection some time ago, and  
wanted to implement the Reflex model
    of partial behavioral Reflection [1] with a student (David  
Roethlisberger). For that, we needed
    bytecode transformation (at least at that time we thought so...).

    So we looked at Javassist [2] and inspired from that build  
ByteSurgeon. The idea here is that
    we want to insert (or replace) code at any bytecode instruction.  
Of course, we don't want to write
    the to-be-inlined code as bytecode itself, and we do not want to  
deal with the very low level view
    of bytecode where there a many different send-bytecodes, for  
example.

	When you look now at the NewCompiler framwork, then there are two  
things directly trivially visible:
	1) The IR is exactly on the right level of abstraction.
	2) Implementing a small compiler to generate us the to-be-inlined  
code as IR is trivial with the
	   the modular design.

	So this is what we did... added transformation (adding/deleting  
nodes) to the IR, wrote a Compiler
	as a simple subclass of the standard SmaCC based compiler that  
generates IR (extended with
	special syntax to be able to access e.g. the receiver and arguments  
of a send). Then the bytecode
	inling framework is a simple thing.

	As an example, here is a the code that would annotate the class  
Example to log the
	receiver objecrs of all message sends:

		Example instrumentSend: [ :sendInstr |
		 	sendInstr insertBefore: ’Logger logSendTo: <meta: #receiver> ’
		].

	More information:
	Slides: http://www.iam.unibe.ch/~denker/talks/ByteSurgeon-slides.pdf
	Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Denk06a

    ByteSurgeon was then used as the basis for some things:
  		-> first Geppetto (Unanticipated Partial Behavioral Reflection)
            	Slides: http://www.iam.unibe.ch/~denker/misc/GeppettoESUG2006.pdf
             Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Roet08a
  		-> a proof-of concept implementation of an Omniscient Debugger  
similar to Bill Lewis' work
            for Java:
		      Slides: http://www.iam.unibe.ch/~denker/talks/06NODE/UnstuckNode06.pdf
			  Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Hofe06a
		-> It's one of the backends in the Test Coverage tool Christo done  
by Stefan Reichart
		     http://smallwiki.unibe.ch/stefanreichhart/codecoverage/

3) Compiler hack: Global variables as message sends.
    For ChangeBoxes, Pascal Zumkehr needed globals not the be hard- 
coded, but to be accessed via
    message sends. For this, he changed the NewCompiler. It's easy to  
do, and he did it after a short
    introduction over the NewCompiler. The old compiler is quie arcane  
for all these things. (But for
    sure as soon as you get used to the patterns it's not  
impossible... but I think it's odd way of
    coding)

    Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Denk07c

5) Sub-Method Reflection
	Joined work with Phillippe Marschall. This uses the SmaCC/RB-AST part  
of the NewCompiler to
	generate "Reflective" Methods that use and extendend AST instead of  
bytecodes, and it provides
	a small in-image "JIT" that generate bytecode on-demand, which is  
based on the standard NewCompiler
	backend.

     Slides: http://www.iam.unibe.ch/~denker/talks/07TOOLS/07PersephoneTOOLS.pdf
     Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Denk07b

	This was used e.g. for Adrian Lienhard's work on first class aliases  
and Object-Flow Analysis
	 http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Lien07a
      http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Lien07c

6) Textual Annotations on every language constructs.
     Phillippe provided textual annotations for all language  
constructs in the Persephone system.
	This was realized as it's own extended smalltlak compiler (based on  
the SMacc grammar)

     Nik Haldiman used this to build a pluggable type system for Squeak.
     	Slides: http://www.iam.unibe.ch/~denker/talks/07ESUG/07TypePlugESUG.pdf
		Paper: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Hald07b

6) Reflectivity. This merges sub-method reflection with partial  
behavioral reflection.

	Homepage: http://www.iam.unibe.ch/~scg/Research/Reflectivity/index.html
	Slides: http://www.iam.unibe.ch/~denker/talks/07DYLA/07ReflectivityDylan.pdf

	This was used e.g.
		-> for Dynamic Analysis http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi?query=Denk07d&abstract=yes
		-> for Transactional Memory (Lukas Renggli): http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Reng07b
		-> HistOOry, by Frederic Pluquet: http://decomp.ulb.ac.be/frdricpluquet/researchactivities/histoory/

So, all in all I am quite convinced that an open, reusable compiler  
infrastructure provides *huge* benefits for building
experiments and tools and thus exploring the future.

Next part:
	-> Closures and Performance of Closure code. (this may take some  
days... busy)

In addition, I will try to answer the questions that came up and give  
a status report soon.

	Marcus

(I am not subscribed to Squeak-dev anymore, so please CC: me)

References
==========

[1] Reflex: http://www.iam.unibe.ch/~scg/cgi-bin/scgbib.cgi/abstract=yes?Tant03a
[2] Javassist: http://www.csg.is.titech.ac.jp/paper/chiba-gpce03.pdf

--
Marcus Denker  --  denker at iam.unibe.ch
http://www.iam.unibe.ch/~denker