[Vm-dev] Eliot's BlockClosure model questions

Eliot Miranda eliot.miranda at gmail.com
Wed Jul 31 00:41:29 UTC 2013


On Tue, Jul 30, 2013 at 1:56 PM, Clément Bera <bera.clement at gmail.com>wrote:

>
> Thanks for the answer it was very helpful. I got it now.
>
> I had a look at the first posts of your blog (Closures I & II) when I was
> working on the Opal compiler. Today I was looking at Under Cover Contexts
> and the Big Frame-Up<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/> and
> I think I should read all your blog.
>
> That is really nice that you wrote this blog it is the main documentation
> about an efficient Smalltalk VM. I learnt by looking at Cog's source
> mostly. VW VM source is closed so... I will have a look at Strongtalk
> implementation instead it seems it is open source.
>
> Why are the clean blocks of VW much faster ? Are they activated like
> method ? I didn't find it in your blog (probably because it is not in Cog).
> Is it possible to implement clean blocks in Pharo/Squeak ? (I think that
> 53% of blocks non optimized by the compiler are clean in Pharo 3) Would it
> worth it ?
>

Clean blocks are faster because they don't access their outer environment
and hence their outer context does not have to be created.  So there is no
allocation associated with a clean block.  It exists already as a literal
and its outer context does not have to be reified.  Normal closures are
created when the point at which they are defined in method execution is
reached (the pushClosure bytecode) and if the current context does not yet
exist that must be instantiated too, so creating a closure usually takes
two allocations.

Clean blocks are activated like blocks.  Block and method activation is
different in the first phase (the send side) but quite similar in the
second phase (frame building).  In VW for example, finding the machine code
method associated with a block involves a cache lookup which can be slow.
 In Cog, it involves following a pointer in the method header (inside, the
VM replaces the header of a method with a pointer to its machine code) and
then jumping to a hard-coded binary search which jumps to the correct
block's entry-point depending on the closure's startpc.  If a method
contains a single block then this is a direct jump.  As a result, block
dispatch in Cog is typically faster than in VW.

Yes, it is possible to implement clean blocks.  It is only an issue to do
with the representation of closures.  Ideally they need a method inst var,
making the outerContext inst var optional (or at least nil in a clean
block).  But that would require a change to BlockClosure's class definition
and a VM change.  To avoid having to change the class definition of
BlockClosure and the VM, the compiler could create an empty context to hold
onto the method, and that would work fine.  So to implement clean blocks
the compiler would instantiate a BlockClosure literal for each clean block
and a MethodContext whose receiver was nil shared between all the clean
blocks in a method.  There are tricky issues such as setting breakpoints in
methods (toggle break on entry), or copying methods, which would require
scanning the literals for clean blocks and duplicating them and their
outerCOntext too.  But that's just detail.  Some time I must try this for
Squeak.  Let me know if you try if=t for Opal.  (and of course I'm very
happy to help with advice).

I expect that in certain cases the speedup would be noticeable, but it is a
micro-optimization.  You'd of course only notice the difference in tight
loops that used clean blocks.


2013/7/30 Eliot Miranda <eliot.miranda at gmail.com>
>
>>
>> http://www.mirandabanda.org/cogblog/2008/06/07/closures-part-i/
>> Hi Clément,
>>
>> On Mon, Jul 29, 2013 at 1:54 AM, Clément Bera <bera.clement at gmail.com>wrote:
>>
>>>
>>> Hello guys,
>>>
>>> I was looking recently at the blockClosure model of Eliot in
>>> Pharo/Squeak and the blockClosure model of VisualWorks and I have a few
>>> questions.
>>>
>>> - Why Pharo/Squeak does not have compiled block as in VW and has the
>>> block byte code in the enclosing method ? Is it to save memory ? Would it
>>> worth it to implement CompiledBlock in term of speed and memory consumption
>>> ?
>>>
>>
>> Squeak derives directly from the "blue book" Smalltalk-80 implementation
>> in which CompiledMethod is a hybrid object, half pointers (method header
>> and literals) and half bytes (bytecode and source pointer).  This format
>> was chosen to save space in the original 16-bit Smalltalk implementations
>> on the Xerox D machines (Alto & Dorado).  VisualWorks has a few extra steps
>> in between,  In ObjectWorks 2.4 and ObjectWorks 2.5 Peter Deutsch both
>> introduced closures and eliminated the hybrid CompiledMethod format,
>> introducing CompiledBlock.
>>
>> IMO adding CompiledBlock, while simplifying the VM a little would not
>> improve performance, especially in the interpreter, essentially because
>> activating and retuning form methods now requires an ecxtra level of
>> indirection to get from the CompiledMethod object to its bytecodes in its
>> bytecode object.
>>
>> However, adding CompiledBlock (or rather eliminating the hybrid
>> CompiledMethod format) would definitely *not* save space.  The hybrid
>> format is more compact (one less object per method).  One can try and
>> improve this as in VisualWorks by encoding the bytecodes of certain methods
>> as SmallIntegers in the literal frame, but this is only feasible in a pure
>> JIT VM.  Squeak still has an interpreter, and Cog is a hybrid JIT and
>> Interpreter.  In an interpreter it is costly in performance to be able to
>> interpret this additional form of bytecodes.
>>
>> So IMO while the hybrid CompiledMethod isn't ideal it is acceptable,
>> having important advantages to go along with its disadvantages.
>>
>>  - Why Pharo/Squeak context have this variable closureOrNil instead of
>>> having the closure in the receiver field as in VW ? Is it an optimization
>>> because there are a lot of access to self and instance variables in the
>>> blocks in Pharo/Squeak ? Because if I'm correct it uses 1 more slot per
>>> stack frame to have this.
>>>
>>
>> I did this because I think its simpler and more direct.  I don't like
>> VW's access to the receiver and inst vars having to use different bytecodes
>> within a block to within a method.  There are lots of complexities
>> resulting from this (e.g. in scanning code for inst var refs, the
>> decompiler, etc).
>>
>> But in fact there isn't really an additional stack slot because the frame
>> format in the VM does not use the stacked receiver (the 0'th argument) as
>> accessing the receiver in this position requires knowing the method's
>> argument count.  So in both methods and blocks the receiver is pushed on
>> the stack immediately before allocating space for, and nilling, any
>> temporaries.  This puts the receiver in a known place relative to the frame
>> pointer, making it accessible to the bytecodes without having to know the
>> method's argument count.  So the receiver always occurs twice on the stack
>> in a method anyway.  In a block, the block is on the stack in the 0'th
>> argument position.  The actual receiver is pushed after the temps.
>>
>> - Lastly, does VW have the tempVector optimization for escaping write
>>> temporaries in their blockClosure ? It seems they have not (I don't see any
>>> reference to it in VW 7). Did Pharo/Squeak blocks earns a lot of speed or
>>> memory with this optimization ?
>>>
>>
>> Yes, VW has this same organization.  I implemented it in VisualWorks 5i
>> in ~ 2000.  It resulted in a significant increase in performance (for
>> example, factors of two improvement in block-intensive code such as
>> exception handling).  This is because of details in the context-to-stack
>> mapping machinery which mean that if an activation of a closure can update
>> the temporaries of its outer contexts then keeping contexts and stack
>> frames in sync is much more complex and costly.  The 5i/Cog organization
>> (which in fact derives from some Lisp implementations) results in much
>> simpler context-to0stack mapping such that no tests need be done when
>> returning from a method to keep frames and contexts in sync.
>>
>>
>>
>>> Thank you for any answer.
>>>
>>
>> You're most welcome.  Have you read my blog post on the design?  It is "Under
>> Cover Contexts and the Big Frame-Up<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>",
>> with additional information in "Closures Part I" & "Closures Part II –
>> the Bytecodes<http://www.mirandabanda.org/cogblog/2008/07/22/closures-part-ii-the-bytecodes/>
>> ".
>> --
>> best,
>> Eliot
>>
>>
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20130730/48dd411e/attachment-0001.htm


More information about the Vm-dev mailing list