[Vm-dev] Powerful JIT optimization

Wed Nov 6 23:27:29 UTC 2013

On Wed, Nov 6, 2013 at 7:56 AM, Florin Mateoc <florin.mateoc at gmail.com>wrote:

>
>  On 11/6/2013 4:25 AM, Frank Shearar wrote:
>
>
>
> On 06 Nov 2013, at 4:03, Florin Mateoc <florin.mateoc at gmail.com> wrote:
>
>    On 11/5/2013 6:30 PM, Eliot Miranda wrote:
>
>
>
>
>
>
> On Mon, Nov 4, 2013 at 8:42 PM, Florin Mateoc <florin.mateoc at gmail.com>wrote:
>
>>
>>  On 11/4/2013 9:05 PM, Eliot Miranda wrote:
>>
>>
>>
>> Hi Florin,
>>
>> On Mon, Nov 4, 2013 at 12:30 PM, Florin Mateoc <florin.mateoc at gmail.com>wrote:
>>
>>>
>>>  On 11/4/2013 3:07 PM, Eliot Miranda wrote:
>>>
>>> Hi Florin,
>>>
>>> On Mon, Nov 4, 2013 at 7:09 AM, Florin Mateoc <florin.mateoc at gmail.com>wrote:
>>>
>>>>
>>>> Hi Eliot,
>>>>
>>>> I am not sure if this is the right moment to bring this up, when you
>>>> are so busy with the new garbage collector, but,
>>>> since you were also talking about powerful new optimizations and this
>>>> seems a very good one... I was toying with the
>>>> idea before, but I did not have the right formulation for it - I was
>>>> thinking of doing it on the image side, at the AST
>>>> level and then communicating somehow with the VM (this aspect becomes
>>>> moot if the JIT code is generated from Smalltalk),
>>>> but now I stumbled upon it on the web and I think it would be better
>>>> done inside the JIT. In Rémi Forax' formulation:
>>>>
>>>> "On thing that trace based JIT has shown is that a loop or function are
>>>> valid optimization entry points. So like you can
>>>> have an inlining cache for function at callsite, you should have a kind
>>>> of inlining cache at the start of a loop."
>>>>
>>>> This was in the context of a blog entry by Cliff Click:
>>>>
>>>> http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
>>>> The comments also contain other useful suggestions.
>>>>
>>>> And, the loop inlining cache could also specialize not just on the
>>>> receiver block, but also on the types of the
>>>> arguments (this is true for methods as well, but, in the absence of
>>>> profiling information, loops are more likely to be
>>>> "hot", plus we can easily detect nested loops which reinforce the
>>>> "hotness")
>>>>
>>>
>>>  AFAICT this is subsumed under adaptive optimization/speculative
>>> inlining.  i.e. this is one of the potential optimizations in an adaptive
>>> optimizing VM.  Further, I also believe that by for the best place to do
>>> this kind of thing is indeed in the image, and to do it at the
>>> bytecode-to-bytecode level.  But I've said this many times before and don't
>>> want to waste cycles waffling again.
>>>
>>>  thanks.
>>> e.
>>>
>>>  Regards,
>>>> Florin
>>>>
>>>
>>> --
>>> best,
>>> Eliot
>>>
>>> This is a bit like saying that we don't need garbage collection because
>>> we can do liveness/escape analysis in the image. I think there is a place
>>> for both sides
>>>
>>
>>  No it's not.  If you read my design sketch on bytecode-to-bytecode
>> adaptive optimisation you'll understand that it's not.  It's simply that
>> one can do bytecode-to-bytecode adaptive optimisation in the image, and
>> that that's a better place to do adaptive optimisation than in the VM.  But
>> again I've gone into this many times before on the mailing list and I don't
>> want to get into it again.
>>
>>
>> Can't compiler technology (coupled with type inference) also be applied,
>> in the image, to stack allocation/pretenuring/automatic pool allocation...
>> to simplify the garbage collector and potentially obviating the need for a
>> performant one in the vm?
>>
>
>  I doubt it.  It already is used to e.g. create clean blocks.  This
> certainly isn't enough of a win to be able to live with a poor GC.
>
>
>>  If it can, why doesn't the same argument apply?
>>
>
>  It can't so the argument doesn't apply.
>
>
>> And why did you implement inline caches in the VM if they were better
>> done in the image?
>>
>
>  Because they're not better-done in the image.  In fact, adaptive
> optimization is heavily dependent on online caches.  That's where it gets
> its type information form.
>
>  This doesn't feel like a productive conversation.
>
>
>
> Indeed.
> The garbage collector example was an afterthought and even a bit
> facetious, sorry about that. But the initial point still stands. It is the
> same kind of optimization like inline caches, not a different kind of
> adaptive optimization that is facilitated by them. If it is worth doing
> inline caches for method calls, it is worth doing them for block
> evaluations in loops as well.
>
>
>  In other words you see block invocations as (possibly only nearly)
> identical to message sends?
>
>  frank
>
>
> Well, yes. I think there are two aspects to consider: one is just the
> message send #value and its cousins. Here classes as types (for the
> receiver) are useless. More properly would be to consider as the type of
> the block, the block itself. In many cases there is a limited number of
> blocks that propagate to a specific call site. Now this does not work for
> loops, where, inside the implementation of #do:, the #value: invocation
> would be megamorphic. But fortunately for us, we know that #do: and its
> iterator cousins are usually called with a literal block, so the code that
> needs to be executed (within the block) does not need to be looked up. At
> those sites, instead of caching the #do: implementation, which in itself is
> uninteresting, we can cache a cloned #do: implementation with a bound
> #value: call. This avoids both the megamorphic call as well as the inlining
> of the block body (which is what I think Eliot was objecting to as not the
> right kind of optimization at the VM level)
>

I'm not objecting to doing this kind of optimization at the VM level in
principle, only in my own work.   What you;re talkign about (block
inlining, or control-path splitting, with intent to eliminate block
activation overhead, or apply other strenght-reducing optimization) is
indeed something that adaptive optimizers do at the VM level, and that
tracing VMs do at the VM level.  But personally I *don't* want to do these
kinds of optimizations at the VM level.  I ant to do them at the
bytecode-to-bytecode level, where, I believe, they can be done equally well.

But as you said yourself Florin, now is not a good time for me as I'm
focussed on Spur.

Now goodnight, and good luck.
-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20131106/4aed8409/attachment-0001.htm