[Newcompiler] Optimizing compiled methods for performance [was: A day optimizing]

Wed May 30 08:39:04 UTC 2007

On Wed, 30 May 2007 10:02:31 +0200, Marcus Denker wrote:
> On 29.05.2007, at 23:38, Klaus D. Witzel wrote:
...
>> When discussing which construct saved what bytecode, Stef asked,
>> would it be possible that having a smarter compiler we avoid to
>> write shit code like
>>
>> 	position := position + (size := 1)
>>
>> but write it in a readable way and the compiler does its job?
>>
>> Roel stepped in and said that's about the simplest thing any
>> optimizing compilers can do. He mentioned building a data-glow
>> graph [ed.: MIT's using that term] on which to do some form of
>> pattern matching. This is a huge active area of research but Roel
>> hasn't seen something in the context of blockclosures, for example.
>>
>
> Yes, there is lots of research on compiler optimizations. My brain
> has lots of knowledge buried somewhere from the compiler construction
> lectures... e.g. on SSA form and how to optimize using it.
> (We do have an SSA Framwork for Squeak, called "AOStA").

Yes, Bryce mentioned AOStA in private email (asked him to post his  
thoughts here).

>> Then I threw in concerns about acceptabilty: the debugger must know
>> about optimizations (for example when a temp var was optimized
>> away, as we did for the Nile streams). Debugger and compiler must
>> be kept in sync when doing clever optimizations.
>>
>> Roel sketched three things to be associated with an optimized method:
>> 1- Optimized bytecode that are interpreted,
>> 2- A non-optimized format that is used when debugging (this could
>> be (unoptimized) bytecodes but will mostly likely be the meta-
>> information for the optimized interpreted form),
>> 3- A string with the source code.
>>
>> Other items mentioned by Roel like the use of AST and Rewriteable
>> Reference Attribute Grammars (see Ecoop'04 paper), a grammar format
>> for specifying rewrites of ASTs, can be found in the history
>> section further down.
>
> I will check that.
>>
>> Markus suggested to keep ASTs (instead of non-optimized version as
>> bytecode) as meta data for the debugger's job and pointed us to
>>
>> - http://www.iam.unibe.ch/~scg/Research/Reflectivity/
>>
>> and mentioned that it gets extremely tricky when merging methods,
>> e.g. inlining.
>>
>
> So my overall feeling is that as soon as we plan to give up a simple
> one-to-one mapping between Bytecode and Text, should not use
> the Bytecode anymore in the debugger directly. Right now, the
> Debugger interprets bytecode, but this is not needed. It could interpret
> an AST instead.
>
> Then we do not need to map the optimized code to the old one in the
> same way: There is only one problem to solve: Entering the debugger.
> So when e.g. an exception occurs, the debugger needs to debug a
> method. So it needs to re-create a stack that looks like a non-optimized
> stack. This is tricky for some cases,
>
> -> e.g. when methods get inlined, there is only one stack frame in
> the vm for many conceptual ones.
>      ("Dynamic Deoptimization", see Self)

I think that addressing consequences of method inlining can wait until  
this form of optimization (and the corresponding deoptimization) can be  
handled with, say max 10 methods each :-D

> -> In a method the compiler could have optimized away variables. How
> can we support to debug those methods? Can we recontruct
>      the state of those variables?

Once the explosive line information problem has been solved, this (for me)  
is the easiest part. The debugger asks for line information #at: an  
instruction pointer. This typically is mapped by an interval. The interval  
can additionally say "from here to there, insert temp var #name at  
position i into the stack view together with an adaptor which provides  
accessors for stack location j". And if the stack value was already  
consumed ... at least there is #restart and #step :)

> The points in the program flow where we can enter the debugger and
> state recontruction is needed are not many. Squeak checks for
> interrupts only on message sends and backwards-jumps.

Yes, these are the central points. But exceptions can be easily created,  
for example I insert frames between thisContext and its sender which have  
the instruction pointer past the first (push) bytecode. There may be  
other, hidden in the wild.

> So, to sum up: To support real optimization, we need a System that
> has the un-optimized code available. This un-optimized version
> should be as high-level as possible and it should support meta-data
> *per node*, not only per method.

I second that.

Cheers
Klaus

> Bytecode in such a system is just like binary code in a JIT compiler:
> hidden from the programer to some extend, and the debugger does
> not care. If the VM has a JIT, such a system would not need to bother
> with bytecode at all.
>
> 	Marcus
>