[Vm-dev] A BlockClosure optimization
siguctua at gmail.com
Sat May 15 17:02:58 UTC 2010
On 15 May 2010 19:35, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> Hi Igor,
> On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>> i just thought, that we could optimize a closure-copy mechanism
>> to reuse a closure (BlockClosure instance), which were created before
>> for same context.
> that's a good one. Also good is precomputing closures for blocks that don't capture their dynamic environment (don't close over any variables and don't include an ^-return; VW parlance "clean blocks"). Another one, but this requires a new bytecode set/vm is to not reify the current context for blocks that don't contain ^-returns (VW parlance "copying blocks"). But these last two should be preferences since they affect debugging (within a block so optimized one can't discover its origin).
> (VW parlance for normal blocks is "full blocks"; all blocks in my closure compiler are full, so the current context must be reified, not an issue in the non-Cog VMs as its already there, but it is an issue in a faster VM, it often means two allocations instead of one).
>> A mechanism of optimization can illustrated by following code.
>> Suppose , you having a method, which using a loop:
>> 1 to: 100 do: [:i |
>> dict at: i ifAbsent: [ foo bar ] ]
>> The idea is to copy a closure from method's literal frame just once,
>> and store it into temp,
>> and then reuse it like following:
>> | closure |
>> 1 to: 100 do: [:i |
>> dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>> A simple benchmark shows that we could gain from it:
>> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
>> | closure | closure := nil.
>> 1000000 timesRepeat: [
>> (closure ifNil: [ closure := [ 1+1] ]) value ]
>> ] timeToRun
>> As you can see, even implemented in smalltalk (without any changes to
>> VM) it shows
>> a significant performance boost.
> That's what's nice about this optimization. It doesn't require any VM modifications ;)
Yes, it doesn't. A compiler can detect, if a closure push bytecode found
in a loop (a bytecode block which having a backwards jump)
and then add a temp and emit the bytecodes to (re)use that temp.
A potential VM modification could be to add a bytecode like
"pushBlockClosure: litIndex orReuseTemp: tempIndex ",
which should be equivalent to given piece: closure ifNil: [ closure := [ 1+1] ].
Too bad, this technique can't be used for a nested blocks, since they
have to use different 'outerContext'
per each activation of outer closure. :(
>> Of course, if we put something, which loads processor by real work,
>> instead of just [1+1],
>> the difference will be less significant.
>> But apart from this, lying the fact, that copying closure object each
>> time means memory allocation,
>> and hence leads to more frequent GC.
> What real codes have you seen the costs in? I think they're there (VisualWorks went to some effort to reduce costs using the two other optimizations I described) but how big? In any case you should implement this and see whether any useful benchmarks (e.g. system recompilation) show measurable speedups.
I never went to exploring a bytecode emission code in
Encoder / MethodNode / BytecodeAgnosticMethodNode.
It is hellishly complex and its easy to get lost there. :)
But i could try.
>> Best regards,
>> Igor Stasenko AKA sig.
Igor Stasenko AKA sig.
More information about the Vm-dev