About the new compiler, Part I

John Brant brant at refactoryworkers.com
Sat Jan 19 17:11:56 UTC 2008

Marcus Denker wrote:

> 4) The annotated AST now can be used to generate Code. The NewCompiler 
> uses a very nice small "Squeak Assembler" to emit
>    the code. The class for Code Generation is ASTTranslator. Again a 
> visitor, walks over the tree and calls IRBuilder to
>    emit code.
>    Here in ASTTranslator the magic happens for the inlining of 
> if/and/while and so on:
>     acceptMessageNode: aMessageNode
>         aMessageNode isInlineIf ifTrue: [^ self emitIfNode: aMessageNode].
>         aMessageNode isInlineIfNil ifTrue: [^ self emitIfNilNode: 
> aMessageNode].
>         aMessageNode isInlineAndOr ifTrue: [^ self emitAndOrNode: 
> aMessageNode].
>         aMessageNode isInlineWhile ifTrue: [^ self emitWhileNode: 
> aMessageNode].
>         aMessageNode isInlineToDo ifTrue: [^ self emitToDoNode: 
> aMessageNode].
>         aMessageNode isInlineCase ifTrue: [^ self emitCaseNode: 
> aMessageNode].
>         ^ self emitMessageNode: aMessageNode
>     Uncomment the line with "isInlineIfNil" --> no compiler generates 
> normal message send for ifNil:. Easy.

Since you have the transformations available, you could use them as 
macros to eliminate some of the cases here. For example, #ifNil: can be 
transformed into "isNil ifTrue:".

When I was working on #Smalltalk for .NET, I used the transformations 
for all of your special cases. After the code was parsed, it would 
perform the transformations on the parse tree to get a new tree that was 
then compiled. Each transformation was named and could be added or 
removed on a class by class basis. For the #ifNil: message, I would add 
a global rule:
	Name: #ifNil:
	Search: ``@a ifNil: ``@b
	Replace: ``@a isNil ifTrue: ``@b
If someone wanted to remove the #ifNil: transformation for a particular 
class, they could add a class rule with the same name and without any 
search/replace expressions.

In addition to the transformations, I also created a special message 
that was evaluated in the context of the current compiler. In 
#Smalltalk, if you sent a #primitive:* message to Compiler global, it 
would evaluate the primitive block in the compiler's context. For 
example, you could have code like:
         | a |
         a := self printString.
             primitive: [:compiler :block |
                 1 to: block statements size by: 2
                     do: [:i |
                             emitStatement: (block statements at: i)]]
             argument: [a := a , '1'. a := a , '2'. a := a , '3'].

When this is compiled, it would get to the Compiler #primitive:argument: 
message, and then the compiler would evaluate the #primitive: block 
argument passing the compiler and the "[a := a , '1'. a := a , '2'. a := 
a , '3']" parse tree as the arguments. In this example, the primitive 
block tells the compiler to emit only the odd statements in the block. 
Effectively, this would generate code that would look like:
         | a |
         a := self printString.
         a := a , '1'.
         a := a , '3'.

In addition to the argument: keyword, I also have an evaluate: keyword 
for the #primitive:* message send. While the argument: keyword causes 
the corresponding parse tree to be passed to the primitive: block, the 
evaluate: keyword causes the compiler to compile the code that pushes 
its corresponding argument on the stack. For example, in #Smalltalk, I 
handle a == ifTrue: pattern using:
	Name: ==ifTrue:
	Search: ``@a == ``@b ifTrue: ``@trueBlock `{:node | node isBlock and: 
[node arguments isEmpty]}'
	Replace: Compiler
			[:methodCompiler :block |
			methodCompiler generateIdentityIfTrue: block ifFalse: nil]
		evaluate: ``@a
		evaluate: ``@b
		argument: ``@trueBlock

When the primitive block is evaluated, it knows that the code to push 
the ``@a and ``@b objects has been generated, so it just needs to 
generate a jump if equal and the block.

By combining the transformations with the Compiler #primitive:* 
messages, we can eliminate almost all of your special cases in your 
method above. In #Smalltalk, I only have two special cases. One for the 
Compiler #primitive:* messages and another one that handles sending 
messages to non #Smalltalk objects. For Squeak, you wouldn't need to 
worry about sending messages to non Squeak objects, so you could 
simplify the method above to only one special case.

Once I had implemented this for #Smalltalk, it was interesting 
performing tests comparing the optimized code to the unoptimized code. 
The unoptimized code was much bigger since every ifTrue:, whileTrue:, 
etc. block was a real block and needed all of the code for a real block. 
Also, it was interesting finding a real benchmark that wouldn't crash 
with a stack overflow. For example, if you have no optimizations, then 
something as simple as "100000000 timesRepeat: []" will blow the stack.

John Brant

More information about the Squeak-dev mailing list