[Newcompiler] Optimizing compiled methods for performance [was: A day optimizing]

Wed May 30 07:26:28 UTC 2007

excellent summary

Stef
On 29 mai 07, at 23:38, Klaus D. Witzel wrote:

> [Stef, here's my attempt to sum up the discussion; history is at  
> the bottom]
>
> It all began when Damien asked for faster code for his Nile package  
> (so if necessary do blame him :) Nile makes heavy use of Traits and  
> features a crystal clear component plan. We exchanged a couple of  
> messages and each [message, pun intended] made the code run faster :)
>
> The Nile streams are now faster than the ST-80 (+ the Squeak hacks  
> which accumulated over time) streams and here are the results  
> Damien posted on Monday:
>
> #next is	 4% faster
> #nextPutAll: is	 9% faster
> #next: is	35% faster
> #nextPut: is	50% faster
>
> impressive. awesome. moves the same amount of data in less time  
> *and* has better features.
>
> When discussing which construct saved what bytecode, Stef asked,  
> would it be possible that having a smarter compiler we avoid to  
> write shit code like
>
> 	position := position + (size := 1)
>
> but write it in a readable way and the compiler does its job?
>
> Roel stepped in and said that's about the simplest thing any  
> optimizing compilers can do. He mentioned building a data-glow  
> graph [ed.: MIT's using that term] on which to do some form of  
> pattern matching. This is a huge active area of research but Roel  
> hasn't seen something in the context of blockclosures, for example.
>
> Then I threw in concerns about acceptabilty: the debugger must know  
> about optimizations (for example when a temp var was optimized  
> away, as we did for the Nile streams). Debugger and compiler must  
> be kept in sync when doing clever optimizations.
>
> Roel sketched three things to be associated with an optimized method:
> 1- Optimized bytecode that are interpreted,
> 2- A non-optimized format that is used when debugging (this could  
> be (unoptimized) bytecodes but will mostly likely be the meta- 
> information for the optimized interpreted form),
> 3- A string with the source code.
>
> Other items mentioned by Roel like the use of AST and Rewriteable  
> Reference Attribute Grammars (see Ecoop'04 paper), a grammar format  
> for specifying rewrites of ASTs, can be found in the history  
> section further down.
>
> Markus suggested to keep ASTs (instead of non-optimized version as  
> bytecode) as meta data for the debugger's job and pointed us to
>
> - http://www.iam.unibe.ch/~scg/Research/Reflectivity/
>
> and mentioned that it gets extremely tricky when merging methods,  
> e.g. inlining.
>
> --------------------------------------------
> End sum. Please post what I failed to notice.
> --------------------------------------------
>
> My impression of Rewriteable Reference Attribute Grammars is that  
> they are primarily used for injecting expansions of domain specific  
> "macros" (like sql strings) into existing languages and also for  
> pre-processing nodes (again, mainly expansions) for simplifying the  
> code generator's part. And the paper of Ekman and Hedin (ReRAGs,  
> JastAdd II) for example is "centered" around Edsger W. Dijkstra's  
> algorithm for self-stabilization of their rewrite rules (without  
> mentioning Dijkstra).
>
> But for our problem at hand rewrite rules for cross statement nodes  
> are needed. And I did not find a single example where a ReRAG rule  
> crossed the statement boundary?
>
> Cheers
> Klaus
>
> On Tue, 29 May 2007 15:03:15 +0200, Wuyts Roel wrote:
>
>> The line information would have to go through the meta-information.
>>
>> Compilation takes as input the source code and produces an AST (or  
>> in Denker's things, the AST is already there).
>>
>> From the AST two things can be done (either through a visitor  
>> directly on the AST or from the outside; it does not matter for  
>> the discussion):
>> 1- produce an optimized AST (a graph actually) with edges that  
>> enable to trace back, and
>> 2- produce bytecode and meta information: bytecode from the AST's  
>> itself and meta-information from the back-links.
>>
>> If it can do this, then we can, given source code:
>> - produce an unoptimized version (should be rarely needed).
>> - produce an optimized AST, and from that produce bytecodes.
>>
>> Note that making optimization an AST transformation would make it  
>> easy to experiment with various optimization strategies, as long  
>> as they add the necessary meta-information. As a format for the  
>> AST we could have a look at the Rewriteable Reference Attribute  
>> Grammars (see Ecoop'04 paper), a grammar format that makes it  
>> quite easy to specify rewrites of ASTs that can contain links to  
>> other nodes.
>>
>>> -----Original Message-----
>>> From: Klaus D. Witzel [mailto:klaus.witzel at cobss.ch]
>>> Sent: Tuesday 29 May 2007 14:35
>>> To: Wuyts Roel; stephane ducasse
>>> Cc: Damien Cassou; Mathieu Suen
>>> Subject: Re: A day optimizing
>>>
>>> I agree with what you wrote, and it is not too naïve.
>>>
>>> I think that one of the biggest challenge is the line
>>> information (map of bytecode positions to source code
>>> positions). After that is solved, the temps which have been
>>> optimized away will be almost easy to cope with :)
>>>
>>> Imagine a debugger session stepping through an optimized
>>> method in which [partial] expressions have been moved closer
>>> to the instruction in which they are needed (possibly
>>> duplicated in #ifTrue:ifFalse: branches and other blocks).
>>> How would highlighting the active
>>> expression/send/assign/return in the debugger have to work?
>>>
>>> Cheers
>>> Klaus
>>>
>>> On Tue, 29 May 2007 13:48:25 +0200, Wuyts Roel wrote:
>>>
>>> > Right, I forgot about the debugger.
>>> >
>>> > Does this mean that there are three things associated with a  
>>> method:
>>> > 1- Optimized bytecode that are interpreted,
>>> > 2- A non-optimized format that is used when debugging (this  
>>> could be
>>> > (unoptimized) bytecodes but will mostly likely be the
>>> meta-information
>>> > for the optimized interpreted form),
>>> > 3- A string with the source code.
>>> >
>>> > Compiling source code results in optimized byte code as well as
>>> > meta-information (or a debuggable version).
>>> >
>>> > Decompiling source code can be done in two ways: producing the
>>> > optimized version or producing the non-optimized version
>>> (one is good
>>> > for seeing and checking what gets produced, while the other one is
>>> > more easily understandable).
>>> >
>>> > Or is this too naïve and do I forget other things ? I think that,
>>> > depending on the supported transformations, keeping the
>>> > meta-information can be really, really tricky.
>>> >
>>> >> -----Original Message-----
>>> >> From: Klaus D. Witzel [mailto:klaus.witzel at cobss.ch]
>>> >> Sent: Tuesday 29 May 2007 11:27
>>> >> To: stephane ducasse
>>> >> Cc: Damien Cassou; Mathieu Suen; Wuyts Roel
>>> >> Subject: Re: A day optimizing
>>> >>
>>> >> On Tue, 29 May 2007 10:48:58 +0200, Stef wrote:
>>> >>
>>> >> > would it be possible that having a smarter compiler we
>>> >> avoid to write
>>> >> > shit code like
>>> >> >
>>> >> > 	position := position + (size := 1)
>>> >> >
>>> >> > but write it in a readable way and the compiler does its job?
>>> >>
>>> >> Sure but acceptance may become the biggest problem: if the
>>> compiler
>>> >> optimizes temp vars away then they won't be seen in the (classic)
>>> >> debugger because all unassigned values on the stack are  
>>> invisible.
>>> >>
>>> >> I think the debugger must be refurbished at the same time the
>>> >> compiler does more clever optimizations. This also means
>>> the debugger
>>> >> must be fed meta information (for example, from
>>> >> MethodProperty) which is definitely not contained in the
>>> bytecodes of
>>> >> a compiled method.
>>> >>
>>> >> Cheers
>>> >> Klaus
>>> >>
>>> >> > Stef
>>> >> >
>>> >> > On 28 mai 07, at 22:01, Klaus D. Witzel wrote:
>>> >> >
>>> >> >> On Mon, 28 May 2007 21:07:41 +0200, Damien Cassou wrote:
>>> >> >>
>>> >> >>> 2007/5/28, Klaus D. Witzel <klaus.witzel at cobss.ch>:
>>> >> >>>> Aha, two new members of the terminal pop bytecode
>>> >> eliminators guild
>>> >> >>>> at work! In particular the assignment to the
>>> collection iVar in
>>> >> >>>> #nextPutAll:
>>> >> >>>> of NSStringWriter, while using same iVar as argument in
>>> >> a cascade
>>> >> >>>> :) BTW
>>> >> >>>
>>> >> >>>
>>> >> >>> Can you give more details about that please?
>>> >> >>
>>> >> >> The statement goes, in abstract form,
>>> >> >>
>>> >> >>  iVar := (iVar species new: amount) cascade1: iVar; cascade2:
>>> >> >> whatsoever.
>>> >> >>
>>> >> >> The first appearance of iVar is subject to terminal pop
>>> (and never
>>> >> >> ever used again by #nextPutAll:) and so cannot amortized.
>>> >> >>
>>> >> >> The second appearance of iVar is, from a performance point
>>> >> of view,
>>> >> >> inevitable because a receiver is derived from it.
>>> >> >>
>>> >> >> The third appearance of iVar is the second part of the
>>> >> trick: at this
>>> >> >> point in time iVar still holds the old value. So a temp
>>> >> var for the
>>> >> >> new instance is optimized away (saves at least two bytecodes).
>>> >> >>
>>> >> >> In summary the above saves: a) assigning to a temp, b)
>>> loading the
>>> >> >> temp for the receiver of the cascade, which in unoptimized
>>> >> form would
>>> >> >> be necessary because iVar is assigned a new instance
>>> *and* used as
>>> >> >> receiver [derivative] in the cascade.
>>> >> >>
>>> >> >> Instead, the result of (iVar species new: amount) is left
>>> >> unassigned
>>> >> >> on the stack (this is top speed and the first part of
>>> the trick),
>>> >> >> duplicated once for the cascade (again: top speed),
>>> *not* popped
>>> >> >> after the cascade (this implicitly contributes to saving the
>>> >> >> bytecodes), then used in the assignment (inevitable) and
>>> >> thereafter
>>> >> >> popped into limbo (cannot be amortized because never ever
>>> >> used again).
>>> >> >>
>>> >> >> ---------------
>>> >> >>
>>> >> >> Did I write about the details you wanted?
>>> >> >>
>>> >> >> Cheers
>>> >> >> Klaus
>>> >> >
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>
>
>