[Vm-dev] A BlockClosure optimization

Sat May 15 16:35:26 UTC 2010

Hi Igor,

On Fri, May 14, 2010 at 10:14 PM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> Hello,
>
> i just thought, that we could optimize a closure-copy mechanism
> to reuse a closure (BlockClosure instance), which were created before
> for same context.
>

    that's a good one.  Also good is precomputing closures for blocks that
don't capture their dynamic environment (don't close over any variables and
don't include an ^-return; VW parlance "clean blocks").  Another one, but
this requires a new bytecode set/vm is to not reify the current context for
blocks that don't contain ^-returns (VW parlance "copying blocks").  But
these last two should be preferences since they affect debugging (within a
block so optimized one can't discover its origin).

(VW parlance for normal blocks is "full blocks"; all blocks in my closure
compiler are full, so the current context must be reified, not an issue in
the non-Cog VMs as its already there, but it is an issue in a faster VM, it
often means two allocations instead of one).

> A mechanism of optimization can illustrated by following code.
>
> Suppose , you having a method, which using a loop:
>
> myMethod
>
>  1 to: 100 do: [:i |
>   dict at: i ifAbsent: [ foo bar ] ]
>
> The idea is to copy a closure from method's literal frame just once,
> and store it into temp,
> and then reuse it like following:
>
> myMethod
> | closure |
>  1 to: 100 do: [:i |
>   dict at: i ifAbsent: (closure ifNil: [ closure := [ foo bar ] ] ) ]
>
> ----------
>
> A simple benchmark shows that we could gain from it:
>
> [ 1000000 timesRepeat: [ [ 1+1] value ] ] timeToRun
> 670
>
> [
> | closure |  closure := nil.
> 1000000 timesRepeat: [
>        (closure ifNil: [ closure := [ 1+1] ]) value ]
> ] timeToRun
> 518
>
> As you can see, even implemented in smalltalk (without any changes to
> VM) it shows
> a significant performance boost.
>

That's what's nice about this optimization.  It doesn't require any VM
modifications ;)

> Of course, if we put something, which loads processor by real work,
> instead of just [1+1],
> the difference will be less significant.
>
> But apart from this, lying the fact, that copying closure object each
> time means memory allocation,
> and hence leads to more frequent GC.

What real codes have you seen the costs in?  I think they're there
(VisualWorks went to some effort to reduce costs using the two other
optimizations I described) but how big?  In any case you should implement
this and see whether any useful benchmarks (e.g. system recompilation) show
measurable speedups.

> --
> Best regards,
> Igor Stasenko AKA sig.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20100515/3dc98f73/attachment.htm