[Newcompiler] Optimizing compiled methods for performance [was: A day optimizing]

Tue May 29 21:38:04 UTC 2007

[Stef, here's my attempt to sum up the discussion; history is at the  
bottom]

It all began when Damien asked for faster code for his Nile package (so if  
necessary do blame him :) Nile makes heavy use of Traits and features a  
crystal clear component plan. We exchanged a couple of messages and each  
[message, pun intended] made the code run faster :)

The Nile streams are now faster than the ST-80 (+ the Squeak hacks which  
accumulated over time) streams and here are the results Damien posted on  
Monday:

#next is	 4% faster
#nextPutAll: is	 9% faster
#next: is	35% faster
#nextPut: is	50% faster

impressive. awesome. moves the same amount of data in less time *and* has  
better features.

When discussing which construct saved what bytecode, Stef asked, would it  
be possible that having a smarter compiler we avoid to write shit code like

	position := position + (size := 1)

but write it in a readable way and the compiler does its job?

Roel stepped in and said that's about the simplest thing any optimizing  
compilers can do. He mentioned building a data-glow graph [ed.: MIT's  
using that term] on which to do some form of pattern matching. This is a  
huge active area of research but Roel hasn't seen something in the context  
of blockclosures, for example.

Then I threw in concerns about acceptabilty: the debugger must know about  
optimizations (for example when a temp var was optimized away, as we did  
for the Nile streams). Debugger and compiler must be kept in sync when  
doing clever optimizations.

Roel sketched three things to be associated with an optimized method:
1- Optimized bytecode that are interpreted,
2- A non-optimized format that is used when debugging (this could be  
(unoptimized) bytecodes but will mostly likely be the meta-information for  
the optimized interpreted form),
3- A string with the source code.

Other items mentioned by Roel like the use of AST and Rewriteable  
Reference Attribute Grammars (see Ecoop'04 paper), a grammar format for  
specifying rewrites of ASTs, can be found in the history section further  
down.

Markus suggested to keep ASTs (instead of non-optimized version as  
bytecode) as meta data for the debugger's job and pointed us to

- http://www.iam.unibe.ch/~scg/Research/Reflectivity/

and mentioned that it gets extremely tricky when merging methods, e.g.  
inlining.

--------------------------------------------
End sum. Please post what I failed to notice.
--------------------------------------------

My impression of Rewriteable Reference Attribute Grammars is that they are  
primarily used for injecting expansions of domain specific "macros" (like  
sql strings) into existing languages and also for pre-processing nodes  
(again, mainly expansions) for simplifying the code generator's part. And  
the paper of Ekman and Hedin (ReRAGs, JastAdd II) for example is  
"centered" around Edsger W. Dijkstra's algorithm for self-stabilization of  
their rewrite rules (without mentioning Dijkstra).

But for our problem at hand rewrite rules for cross statement nodes are  
needed. And I did not find a single example where a ReRAG rule crossed the  
statement boundary?

Cheers
Klaus

On Tue, 29 May 2007 15:03:15 +0200, Wuyts Roel wrote:

> The line information would have to go through the meta-information.
>
> Compilation takes as input the source code and produces an AST (or in  
> Denker's things, the AST is already there).
>
> From the AST two things can be done (either through a visitor directly  
> on the AST or from the outside; it does not matter for the discussion):
> 1- produce an optimized AST (a graph actually) with edges that enable to  
> trace back, and
> 2- produce bytecode and meta information: bytecode from the AST's itself  
> and meta-information from the back-links.
>
> If it can do this, then we can, given source code:
> - produce an unoptimized version (should be rarely needed).
> - produce an optimized AST, and from that produce bytecodes.
>
> Note that making optimization an AST transformation would make it easy  
> to experiment with various optimization strategies, as long as they add  
> the necessary meta-information. As a format for the AST we could have a  
> look at the Rewriteable Reference Attribute Grammars (see Ecoop'04  
> paper), a grammar format that makes it quite easy to specify rewrites of  
> ASTs that can contain links to other nodes.
>
>> -----Original Message-----
>> From: Klaus D. Witzel [mailto:klaus.witzel at cobss.ch]
>> Sent: Tuesday 29 May 2007 14:35
>> To: Wuyts Roel; stephane ducasse
>> Cc: Damien Cassou; Mathieu Suen
>> Subject: Re: A day optimizing
>>
>> I agree with what you wrote, and it is not too naïve.
>>
>> I think that one of the biggest challenge is the line
>> information (map of bytecode positions to source code
>> positions). After that is solved, the temps which have been
>> optimized away will be almost easy to cope with :)
>>
>> Imagine a debugger session stepping through an optimized
>> method in which [partial] expressions have been moved closer
>> to the instruction in which they are needed (possibly
>> duplicated in #ifTrue:ifFalse: branches and other blocks).
>> How would highlighting the active
>> expression/send/assign/return in the debugger have to work?
>>
>> Cheers
>> Klaus
>>
>> On Tue, 29 May 2007 13:48:25 +0200, Wuyts Roel wrote:
>>
>> > Right, I forgot about the debugger.
>> >
>> > Does this mean that there are three things associated with a method:
>> > 1- Optimized bytecode that are interpreted,
>> > 2- A non-optimized format that is used when debugging (this could be
>> > (unoptimized) bytecodes but will mostly likely be the
>> meta-information
>> > for the optimized interpreted form),
>> > 3- A string with the source code.
>> >
>> > Compiling source code results in optimized byte code as well as
>> > meta-information (or a debuggable version).
>> >
>> > Decompiling source code can be done in two ways: producing the
>> > optimized version or producing the non-optimized version
>> (one is good
>> > for seeing and checking what gets produced, while the other one is
>> > more easily understandable).
>> >
>> > Or is this too naïve and do I forget other things ? I think that,
>> > depending on the supported transformations, keeping the
>> > meta-information can be really, really tricky.
>> >
>> >> -----Original Message-----
>> >> From: Klaus D. Witzel [mailto:klaus.witzel at cobss.ch]
>> >> Sent: Tuesday 29 May 2007 11:27
>> >> To: stephane ducasse
>> >> Cc: Damien Cassou; Mathieu Suen; Wuyts Roel
>> >> Subject: Re: A day optimizing
>> >>
>> >> On Tue, 29 May 2007 10:48:58 +0200, Stef wrote:
>> >>
>> >> > would it be possible that having a smarter compiler we
>> >> avoid to write
>> >> > shit code like
>> >> >
>> >> > 	position := position + (size := 1)
>> >> >
>> >> > but write it in a readable way and the compiler does its job?
>> >>
>> >> Sure but acceptance may become the biggest problem: if the
>> compiler
>> >> optimizes temp vars away then they won't be seen in the (classic)
>> >> debugger because all unassigned values on the stack are invisible.
>> >>
>> >> I think the debugger must be refurbished at the same time the
>> >> compiler does more clever optimizations. This also means
>> the debugger
>> >> must be fed meta information (for example, from
>> >> MethodProperty) which is definitely not contained in the
>> bytecodes of
>> >> a compiled method.
>> >>
>> >> Cheers
>> >> Klaus
>> >>
>> >> > Stef
>> >> >
>> >> > On 28 mai 07, at 22:01, Klaus D. Witzel wrote:
>> >> >
>> >> >> On Mon, 28 May 2007 21:07:41 +0200, Damien Cassou wrote:
>> >> >>
>> >> >>> 2007/5/28, Klaus D. Witzel <klaus.witzel at cobss.ch>:
>> >> >>>> Aha, two new members of the terminal pop bytecode
>> >> eliminators guild
>> >> >>>> at work! In particular the assignment to the
>> collection iVar in
>> >> >>>> #nextPutAll:
>> >> >>>> of NSStringWriter, while using same iVar as argument in
>> >> a cascade
>> >> >>>> :) BTW
>> >> >>>
>> >> >>>
>> >> >>> Can you give more details about that please?
>> >> >>
>> >> >> The statement goes, in abstract form,
>> >> >>
>> >> >>  iVar := (iVar species new: amount) cascade1: iVar; cascade2:
>> >> >> whatsoever.
>> >> >>
>> >> >> The first appearance of iVar is subject to terminal pop
>> (and never
>> >> >> ever used again by #nextPutAll:) and so cannot amortized.
>> >> >>
>> >> >> The second appearance of iVar is, from a performance point
>> >> of view,
>> >> >> inevitable because a receiver is derived from it.
>> >> >>
>> >> >> The third appearance of iVar is the second part of the
>> >> trick: at this
>> >> >> point in time iVar still holds the old value. So a temp
>> >> var for the
>> >> >> new instance is optimized away (saves at least two bytecodes).
>> >> >>
>> >> >> In summary the above saves: a) assigning to a temp, b)
>> loading the
>> >> >> temp for the receiver of the cascade, which in unoptimized
>> >> form would
>> >> >> be necessary because iVar is assigned a new instance
>> *and* used as
>> >> >> receiver [derivative] in the cascade.
>> >> >>
>> >> >> Instead, the result of (iVar species new: amount) is left
>> >> unassigned
>> >> >> on the stack (this is top speed and the first part of
>> the trick),
>> >> >> duplicated once for the cascade (again: top speed),
>> *not* popped
>> >> >> after the cascade (this implicitly contributes to saving the
>> >> >> bytecodes), then used in the assignment (inevitable) and
>> >> thereafter
>> >> >> popped into limbo (cannot be amortized because never ever
>> >> used again).
>> >> >>
>> >> >> ---------------
>> >> >>
>> >> >> Did I write about the details you wanted?
>> >> >>
>> >> >> Cheers
>> >> >> Klaus
>> >> >
>> >>
>> >>
>> >>
>>
>>
>>