[Vm-dev] Powerful JIT optimization

Wed Nov 6 15:56:16 UTC 2013

On 11/6/2013 4:25 AM, Frank Shearar wrote:
>  
>
>
> On 06 Nov 2013, at 4:03, Florin Mateoc <florin.mateoc at gmail.com <mailto:florin.mateoc at gmail.com>> wrote:
>
>> On 11/5/2013 6:30 PM, Eliot Miranda wrote:
>>>  
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 4, 2013 at 8:42 PM, Florin Mateoc <florin.mateoc at gmail.com <mailto:florin.mateoc at gmail.com>> wrote:
>>>
>>>      
>>>     On 11/4/2013 9:05 PM, Eliot Miranda wrote:
>>>>      
>>>>
>>>>
>>>>     Hi Florin,
>>>>
>>>>     On Mon, Nov 4, 2013 at 12:30 PM, Florin Mateoc <florin.mateoc at gmail.com <mailto:florin.mateoc at gmail.com>> wrote:
>>>>
>>>>          
>>>>         On 11/4/2013 3:07 PM, Eliot Miranda wrote:
>>>>>         Hi Florin,
>>>>>
>>>>>         On Mon, Nov 4, 2013 at 7:09 AM, Florin Mateoc <florin.mateoc at gmail.com <mailto:florin.mateoc at gmail.com>>
>>>>>         wrote:
>>>>>
>>>>>
>>>>>             Hi Eliot,
>>>>>
>>>>>             I am not sure if this is the right moment to bring this up, when you are so busy with the new garbage
>>>>>             collector, but,
>>>>>             since you were also talking about powerful new optimizations and this seems a very good one... I was
>>>>>             toying with the
>>>>>             idea before, but I did not have the right formulation for it - I was thinking of doing it on the image
>>>>>             side, at the AST
>>>>>             level and then communicating somehow with the VM (this aspect becomes moot if the JIT code is
>>>>>             generated from Smalltalk),
>>>>>             but now I stumbled upon it on the web and I think it would be better done inside the JIT. In Rémi
>>>>>             Forax' formulation:
>>>>>
>>>>>             "On thing that trace based JIT has shown is that a loop or function are valid optimization entry
>>>>>             points. So like you can
>>>>>             have an inlining cache for function at callsite, you should have a kind of inlining cache at the start
>>>>>             of a loop."
>>>>>
>>>>>             This was in the context of a blog entry by Cliff Click:
>>>>>             http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
>>>>>             The comments also contain other useful suggestions.
>>>>>
>>>>>             And, the loop inlining cache could also specialize not just on the receiver block, but also on the
>>>>>             types of the
>>>>>             arguments (this is true for methods as well, but, in the absence of profiling information, loops are
>>>>>             more likely to be
>>>>>             "hot", plus we can easily detect nested loops which reinforce the "hotness")
>>>>>
>>>>>
>>>>>         AFAICT this is subsumed under adaptive optimization/speculative inlining.  i.e. this is one of the
>>>>>         potential optimizations in an adaptive optimizing VM.  Further, I also believe that by for the best place
>>>>>         to do this kind of thing is indeed in the image, and to do it at the bytecode-to-bytecode level.  But I've
>>>>>         said this many times before and don't want to waste cycles waffling again.
>>>>>
>>>>>         thanks.
>>>>>         e.
>>>>>
>>>>>             Regards,
>>>>>             Florin
>>>>>
>>>>>
>>>>>         -- 
>>>>>         best,
>>>>>         Eliot
>>>>         This is a bit like saying that we don't need garbage collection because we can do liveness/escape analysis
>>>>         in the image. I think there is a place for both sides
>>>>
>>>>
>>>>     No it's not.  If you read my design sketch on bytecode-to-bytecode adaptive optimisation you'll understand that
>>>>     it's not.  It's simply that one can do bytecode-to-bytecode adaptive optimisation in the image, and that that's
>>>>     a better place to do adaptive optimisation than in the VM.  But again I've gone into this many times before on
>>>>     the mailing list and I don't want to get into it again.
>>>>
>>>
>>>     Can't compiler technology (coupled with type inference) also be applied, in the image, to stack
>>>     allocation/pretenuring/automatic pool allocation... to simplify the garbage collector and potentially obviating
>>>     the need for a performant one in the vm?
>>>
>>>
>>> I doubt it.  It already is used to e.g. create clean blocks.  This certainly isn't enough of a win to be able to
>>> live with a poor GC.
>>>  
>>>
>>>     If it can, why doesn't the same argument apply?
>>>
>>>
>>> It can't so the argument doesn't apply.
>>>  
>>>
>>>     And why did you implement inline caches in the VM if they were better done in the image?
>>>
>>>
>>> Because they're not better-done in the image.  In fact, adaptive optimization is heavily dependent on online caches.
>>>  That's where it gets its type information form.
>>>
>>> This doesn't feel like a productive conversation.
>>>  
>>
>> Indeed.
>> The garbage collector example was an afterthought and even a bit facetious, sorry about that. But the initial point
>> still stands. It is the same kind of optimization like inline caches, not a different kind of adaptive optimization
>> that is facilitated by them. If it is worth doing inline caches for method calls, it is worth doing them for block
>> evaluations in loops as well.
>
> In other words you see block invocations as (possibly only nearly) identical to message sends?
>
> frank
>

Well, yes. I think there are two aspects to consider: one is just the message send #value and its cousins. Here classes
as types (for the receiver) are useless. More properly would be to consider as the type of the block, the block itself.
In many cases there is a limited number of blocks that propagate to a specific call site. Now this does not work for
loops, where, inside the implementation of #do:, the #value: invocation would be megamorphic. But fortunately for us, we
know that #do: and its iterator cousins are usually called with a literal block, so the code that needs to be executed
(within the block) does not need to be looked up. At those sites, instead of caching the #do: implementation, which in
itself is uninteresting, we can cache a cloned #do: implementation with a bound #value: call. This avoids both the
megamorphic call as well as the inlining of the block body (which is what I think Eliot was objecting to as not the
right kind of optimization at the VM level)

Florin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131106/9ac99887/attachment.htm