[Newcompiler] Optimizing compiled methods for performance [was: A
day optimizing]
Stéphane Ducasse
stephane.ducasse at univ-savoie.fr
Wed May 30 07:26:28 UTC 2007
excellent summary
Stef
On 29 mai 07, at 23:38, Klaus D. Witzel wrote:
> [Stef, here's my attempt to sum up the discussion; history is at
> the bottom]
>
> It all began when Damien asked for faster code for his Nile package
> (so if necessary do blame him :) Nile makes heavy use of Traits and
> features a crystal clear component plan. We exchanged a couple of
> messages and each [message, pun intended] made the code run faster :)
>
> The Nile streams are now faster than the ST-80 (+ the Squeak hacks
> which accumulated over time) streams and here are the results
> Damien posted on Monday:
>
> #next is 4% faster
> #nextPutAll: is 9% faster
> #next: is 35% faster
> #nextPut: is 50% faster
>
> impressive. awesome. moves the same amount of data in less time
> *and* has better features.
>
> When discussing which construct saved what bytecode, Stef asked,
> would it be possible that having a smarter compiler we avoid to
> write shit code like
>
> position := position + (size := 1)
>
> but write it in a readable way and the compiler does its job?
>
> Roel stepped in and said that's about the simplest thing any
> optimizing compilers can do. He mentioned building a data-glow
> graph [ed.: MIT's using that term] on which to do some form of
> pattern matching. This is a huge active area of research but Roel
> hasn't seen something in the context of blockclosures, for example.
>
> Then I threw in concerns about acceptabilty: the debugger must know
> about optimizations (for example when a temp var was optimized
> away, as we did for the Nile streams). Debugger and compiler must
> be kept in sync when doing clever optimizations.
>
> Roel sketched three things to be associated with an optimized method:
> 1- Optimized bytecode that are interpreted,
> 2- A non-optimized format that is used when debugging (this could
> be (unoptimized) bytecodes but will mostly likely be the meta-
> information for the optimized interpreted form),
> 3- A string with the source code.
>
> Other items mentioned by Roel like the use of AST and Rewriteable
> Reference Attribute Grammars (see Ecoop'04 paper), a grammar format
> for specifying rewrites of ASTs, can be found in the history
> section further down.
>
> Markus suggested to keep ASTs (instead of non-optimized version as
> bytecode) as meta data for the debugger's job and pointed us to
>
> - http://www.iam.unibe.ch/~scg/Research/Reflectivity/
>
> and mentioned that it gets extremely tricky when merging methods,
> e.g. inlining.
>
> --------------------------------------------
> End sum. Please post what I failed to notice.
> --------------------------------------------
>
> My impression of Rewriteable Reference Attribute Grammars is that
> they are primarily used for injecting expansions of domain specific
> "macros" (like sql strings) into existing languages and also for
> pre-processing nodes (again, mainly expansions) for simplifying the
> code generator's part. And the paper of Ekman and Hedin (ReRAGs,
> JastAdd II) for example is "centered" around Edsger W. Dijkstra's
> algorithm for self-stabilization of their rewrite rules (without
> mentioning Dijkstra).
>
> But for our problem at hand rewrite rules for cross statement nodes
> are needed. And I did not find a single example where a ReRAG rule
> crossed the statement boundary?
>
> Cheers
> Klaus
>
> On Tue, 29 May 2007 15:03:15 +0200, Wuyts Roel wrote:
>
>> The line information would have to go through the meta-information.
>>
>> Compilation takes as input the source code and produces an AST (or
>> in Denker's things, the AST is already there).
>>
>> From the AST two things can be done (either through a visitor
>> directly on the AST or from the outside; it does not matter for
>> the discussion):
>> 1- produce an optimized AST (a graph actually) with edges that
>> enable to trace back, and
>> 2- produce bytecode and meta information: bytecode from the AST's
>> itself and meta-information from the back-links.
>>
>> If it can do this, then we can, given source code:
>> - produce an unoptimized version (should be rarely needed).
>> - produce an optimized AST, and from that produce bytecodes.
>>
>> Note that making optimization an AST transformation would make it
>> easy to experiment with various optimization strategies, as long
>> as they add the necessary meta-information. As a format for the
>> AST we could have a look at the Rewriteable Reference Attribute
>> Grammars (see Ecoop'04 paper), a grammar format that makes it
>> quite easy to specify rewrites of ASTs that can contain links to
>> other nodes.
>>
>>> -----Original Message-----
>>> From: Klaus D. Witzel [mailto:klaus.witzel at cobss.ch]
>>> Sent: Tuesday 29 May 2007 14:35
>>> To: Wuyts Roel; stephane ducasse
>>> Cc: Damien Cassou; Mathieu Suen
>>> Subject: Re: A day optimizing
>>>
>>> I agree with what you wrote, and it is not too naïve.
>>>
>>> I think that one of the biggest challenge is the line
>>> information (map of bytecode positions to source code
>>> positions). After that is solved, the temps which have been
>>> optimized away will be almost easy to cope with :)
>>>
>>> Imagine a debugger session stepping through an optimized
>>> method in which [partial] expressions have been moved closer
>>> to the instruction in which they are needed (possibly
>>> duplicated in #ifTrue:ifFalse: branches and other blocks).
>>> How would highlighting the active
>>> expression/send/assign/return in the debugger have to work?
>>>
>>> Cheers
>>> Klaus
>>>
>>> On Tue, 29 May 2007 13:48:25 +0200, Wuyts Roel wrote:
>>>
>>> > Right, I forgot about the debugger.
>>> >
>>> > Does this mean that there are three things associated with a
>>> method:
>>> > 1- Optimized bytecode that are interpreted,
>>> > 2- A non-optimized format that is used when debugging (this
>>> could be
>>> > (unoptimized) bytecodes but will mostly likely be the
>>> meta-information
>>> > for the optimized interpreted form),
>>> > 3- A string with the source code.
>>> >
>>> > Compiling source code results in optimized byte code as well as
>>> > meta-information (or a debuggable version).
>>> >
>>> > Decompiling source code can be done in two ways: producing the
>>> > optimized version or producing the non-optimized version
>>> (one is good
>>> > for seeing and checking what gets produced, while the other one is
>>> > more easily understandable).
>>> >
>>> > Or is this too naïve and do I forget other things ? I think that,
>>> > depending on the supported transformations, keeping the
>>> > meta-information can be really, really tricky.
>>> >
>>> >> -----Original Message-----
>>> >> From: Klaus D. Witzel [mailto:klaus.witzel at cobss.ch]
>>> >> Sent: Tuesday 29 May 2007 11:27
>>> >> To: stephane ducasse
>>> >> Cc: Damien Cassou; Mathieu Suen; Wuyts Roel
>>> >> Subject: Re: A day optimizing
>>> >>
>>> >> On Tue, 29 May 2007 10:48:58 +0200, Stef wrote:
>>> >>
>>> >> > would it be possible that having a smarter compiler we
>>> >> avoid to write
>>> >> > shit code like
>>> >> >
>>> >> > position := position + (size := 1)
>>> >> >
>>> >> > but write it in a readable way and the compiler does its job?
>>> >>
>>> >> Sure but acceptance may become the biggest problem: if the
>>> compiler
>>> >> optimizes temp vars away then they won't be seen in the (classic)
>>> >> debugger because all unassigned values on the stack are
>>> invisible.
>>> >>
>>> >> I think the debugger must be refurbished at the same time the
>>> >> compiler does more clever optimizations. This also means
>>> the debugger
>>> >> must be fed meta information (for example, from
>>> >> MethodProperty) which is definitely not contained in the
>>> bytecodes of
>>> >> a compiled method.
>>> >>
>>> >> Cheers
>>> >> Klaus
>>> >>
>>> >> > Stef
>>> >> >
>>> >> > On 28 mai 07, at 22:01, Klaus D. Witzel wrote:
>>> >> >
>>> >> >> On Mon, 28 May 2007 21:07:41 +0200, Damien Cassou wrote:
>>> >> >>
>>> >> >>> 2007/5/28, Klaus D. Witzel <klaus.witzel at cobss.ch>:
>>> >> >>>> Aha, two new members of the terminal pop bytecode
>>> >> eliminators guild
>>> >> >>>> at work! In particular the assignment to the
>>> collection iVar in
>>> >> >>>> #nextPutAll:
>>> >> >>>> of NSStringWriter, while using same iVar as argument in
>>> >> a cascade
>>> >> >>>> :) BTW
>>> >> >>>
>>> >> >>>
>>> >> >>> Can you give more details about that please?
>>> >> >>
>>> >> >> The statement goes, in abstract form,
>>> >> >>
>>> >> >> iVar := (iVar species new: amount) cascade1: iVar; cascade2:
>>> >> >> whatsoever.
>>> >> >>
>>> >> >> The first appearance of iVar is subject to terminal pop
>>> (and never
>>> >> >> ever used again by #nextPutAll:) and so cannot amortized.
>>> >> >>
>>> >> >> The second appearance of iVar is, from a performance point
>>> >> of view,
>>> >> >> inevitable because a receiver is derived from it.
>>> >> >>
>>> >> >> The third appearance of iVar is the second part of the
>>> >> trick: at this
>>> >> >> point in time iVar still holds the old value. So a temp
>>> >> var for the
>>> >> >> new instance is optimized away (saves at least two bytecodes).
>>> >> >>
>>> >> >> In summary the above saves: a) assigning to a temp, b)
>>> loading the
>>> >> >> temp for the receiver of the cascade, which in unoptimized
>>> >> form would
>>> >> >> be necessary because iVar is assigned a new instance
>>> *and* used as
>>> >> >> receiver [derivative] in the cascade.
>>> >> >>
>>> >> >> Instead, the result of (iVar species new: amount) is left
>>> >> unassigned
>>> >> >> on the stack (this is top speed and the first part of
>>> the trick),
>>> >> >> duplicated once for the cascade (again: top speed),
>>> *not* popped
>>> >> >> after the cascade (this implicitly contributes to saving the
>>> >> >> bytecodes), then used in the assignment (inevitable) and
>>> >> thereafter
>>> >> >> popped into limbo (cannot be amortized because never ever
>>> >> used again).
>>> >> >>
>>> >> >> ---------------
>>> >> >>
>>> >> >> Did I write about the details you wanted?
>>> >> >>
>>> >> >> Cheers
>>> >> >> Klaus
>>> >> >
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>
>
>
More information about the Newcompiler
mailing list