[Vm-dev] A BlockClosure optimization

Eliot Miranda eliot.miranda at gmail.com
Sat May 15 17:12:02 UTC 2010


On Sat, May 15, 2010 at 10:02 AM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> On 15 May 2010 19:35, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> >
> > Hi Igor,
> > On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <siguctua at gmail.com>
> wrote:
> >>
> >> Hello,
> >>
> >> i just thought, that we could optimize a closure-copy mechanism
> >> to reuse a closure (BlockClosure instance), which were created before
> >> for same context.
> >
> >     that's a good one.  Also good is precomputing closures for blocks
> that don't capture their dynamic environment (don't close over any variables
> and don't include an ^-return; VW parlance "clean blocks").  Another one,
> but this requires a new bytecode set/vm is to not reify the current context
> for blocks that don't contain ^-returns (VW parlance "copying blocks").  But
> these last two should be preferences since they affect debugging (within a
> block so optimized one can't discover its origin).
> > (VW parlance for normal blocks is "full blocks"; all blocks in my closure
> compiler are full, so the current context must be reified, not an issue in
> the non-Cog VMs as its already there, but it is an issue in a faster VM, it
> often means two allocations instead of one).
> >
> >
> >>
> >> A mechanism of optimization can illustrated by following code.
> >>
> >> Suppose , you having a method, which using a loop:
> >>
> >> myMethod
> >>
> >>  1 to: 100 do: [:i |
> >>   dict at: i ifAbsent: [ foo bar ] ]
> >>
> >> The idea is to copy a closure from method's literal frame just once,
> >> and store it into temp,
> >> and then reuse it like following:
> >>
> >> myMethod
> >> | closure |
> >>  1 to: 100 do: [:i |
> >>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
> >>
> >> ----------
> >>
> >> A simple benchmark shows that we could gain from it:
> >>
> >> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
> >> 670
> >>
> >> [
> >> | closure |  closure := nil.
> >> 1000000 timesRepeat: [
> >>        (closure ifNil: [ closure := [ 1+1] ]) value ]
> >> ] timeToRun
> >> 518
> >>
> >> As you can see, even implemented in smalltalk (without any changes to
> >> VM) it shows
> >> a significant performance boost.
> >
> > That's what's nice about this optimization.  It doesn't require any VM
> modifications ;)
> >
>
> Yes, it doesn't.  A compiler can detect, if a closure push bytecode found
> in a loop (a bytecode block which having a backwards jump)
> and then add a temp and emit the bytecodes to (re)use that temp.
>
> A potential VM modification could be to add a bytecode like
> "pushBlockClosure: litIndex orReuseTemp: tempIndex ",
> which should be equivalent to given piece: closure ifNil: [ closure := [
> 1+1] ].
>
> Too bad, this technique can't be used for a nested blocks, since they
> have to use different 'outerContext'
> per each activation of outer closure. :(
>

Is that true?  Isn't the real issue that if any niladic block contains an
^-return then that block any any blocks it is nested within cannot be
shared?  It would be good to try and state the invariants in formal English
before you implement.  The implementation will be much better as a result.


> >>
> >> Of course, if we put something, which loads processor by real work,
> >> instead of just [1+1],
> >> the difference will be less significant.
> >>
> >> But apart from this, lying the fact, that copying closure object each
> >> time means memory allocation,
> >> and hence leads to more frequent GC.
> >
> > What real codes have you seen the costs in?  I think they're there
> (VisualWorks went to some effort to reduce costs using the two other
> optimizations I described) but how big?  In any case you should implement
> this and see whether any useful benchmarks (e.g. system recompilation) show
> measurable speedups.
>
> I never went to exploring a bytecode emission code in
> Encoder / MethodNode / BytecodeAgnosticMethodNode.
> It is hellishly complex and its easy to get lost there. :)
> But i could try.
>

That's not what I meant.  I simply meant that you need to measure the effect
of your optimization (caching blocks used in loops in a temp) on real use
cases such as e.g. recompiling the entire system, which is a decent
benchmark.


> >>
> >> --
> >> Best regards,
> >> Igor Stasenko AKA sig.
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20100515/5848652d/attachment.htm


More information about the Vm-dev mailing list