[Vm-dev] A BlockClosure optimization

Sat May 15 17:02:58 UTC 2010

On 15 May 2010 19:35, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
> Hi Igor,
> On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>>
>> Hello,
>>
>> i just thought, that we could optimize a closure-copy mechanism
>> to reuse a closure (BlockClosure instance), which were created before
>> for same context.
>
>     that's a good one.  Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks").  Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks").  But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
> (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>
>
>>
>> A mechanism of optimization can illustrated by following code.
>>
>> Suppose , you having a method, which using a loop:
>>
>> myMethod
>>
>>  1 to: 100 do: [:i |
>>   dict at: i ifAbsent: [ foo bar ] ]
>>
>> The idea is to copy a closure from method's literal frame just once,
>> and store it into temp,
>> and then reuse it like following:
>>
>> myMethod
>> | closure |
>>  1 to: 100 do: [:i |
>>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>>
>> ----------
>>
>> A simple benchmark shows that we could gain from it:
>>
>> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> 670
>>
>> [
>> | closure |  closure := nil.
>> 1000000 timesRepeat: [
>>        (closure ifNil: [ closure := [ 1+1] ]) value ]
>> ] timeToRun
>> 518
>>
>> As you can see, even implemented in smalltalk (without any changes to
>> VM) it shows
>> a significant performance boost.
>
> That's what's nice about this optimization.  It doesn't require any VM modifications ;)
>

Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
in a loop (a bytecode block which having a backwards jump)
and then add a temp and emit the bytecodes to (re)use that temp.

A potential VM modification could be to add a bytecode like
"pushBlockClosure: litIndex orReuseTemp: tempIndex ",
which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].

Too bad, this technique can't be used for a nested blocks, since they
have to use different 'outerContext'
per each activation of outer closure. :(

>>
>> Of course, if we put something, which loads processor by real work,
>> instead of just [1+1],
>> the difference will be less significant.
>>
>> But apart from this, lying the fact, that copying closure object each
>> time means memory allocation,
>> and hence leads to more frequent GC.
>
> What real codes have you seen the costs in?  I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big?  In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.

I never went to exploring a bytecode emission code in
Encoder / MethodNode / BytecodeAgnosticMethodNode.
It is hellishly complex and its easy to get lost there. :)
But i could try.

>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.

-- 
Best regards,
Igor Stasenko AKA sig.