[Vm-dev] Eliot's BlockClosure model questions

Eliot Miranda eliot.miranda at gmail.com
Thu Aug 1 17:15:57 UTC 2013

On Thu, Aug 1, 2013 at 1:21 AM, Clément Bera <bera.clement at gmail.com> wrote:

> Hello Eliot,
> So I implemented clean blocks with Opal in Pharo 3. I didn't know where to
> put the byte code of the clean block, so I put it at the end of the method.
> ex:
> exampleCleanBlock
> ^ [ 1  + 2 ]
> 17 <20> pushConstant: [...]
> 18 <7C> returnTop
> 19 <76> pushConstant: 1
> 20 <77> pushConstant: 2
> 21 <B0> send: +
> 22 <7D> blockReturn
> having in the literal Array:
> [ 1 + 2 ]
> #exampleCleanBlock
> OCOpalExamples
> The startpc of the block is 19.
> Its outerContext is a context with nil as receiver and the method
> OCOpalExamples>>#exampleCleanBlock.
> Its numArgs is 0 and it has no copiedValues.
> But it does not work with the JIT.
> If I run:
> OCOpalExamples new exampleCleanBlock value
> I got 3 all the time, it's fine. Now
> 1 to: 5 do: [ :i |
> OCOpalExamples new exampleCleanBlock value ]
> Works on Stack VM, but crashes Cog VM. I don't know why (not enough
> knowledge about the Cog JIT).
> Do you have any clue ?

no.  send me an image?

> 2013/7/31 Eliot Miranda <eliot.miranda at gmail.com>
>> On Tue, Jul 30, 2013 at 1:56 PM, Clément Bera <bera.clement at gmail.com>wrote:
>>> Thanks for the answer it was very helpful. I got it now.
>>> I had a look at the first posts of your blog (Closures I & II) when I
>>> was working on the Opal compiler. Today I was looking at Under Cover
>>> Contexts and the Big Frame-Up<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/> and
>>> I think I should read all your blog.
>>> That is really nice that you wrote this blog it is the main
>>> documentation about an efficient Smalltalk VM. I learnt by looking at Cog's
>>> source mostly. VW VM source is closed so... I will have a look at
>>> Strongtalk implementation instead it seems it is open source.
>>> Why are the clean blocks of VW much faster ? Are they activated like
>>> method ? I didn't find it in your blog (probably because it is not in Cog).
>>> Is it possible to implement clean blocks in Pharo/Squeak ? (I think that
>>> 53% of blocks non optimized by the compiler are clean in Pharo 3) Would it
>>> worth it ?
>> Clean blocks are faster because they don't access their outer environment
>> and hence their outer context does not have to be created.  So there is no
>> allocation associated with a clean block.  It exists already as a literal
>> and its outer context does not have to be reified.  Normal closures are
>> created when the point at which they are defined in method execution is
>> reached (the pushClosure bytecode) and if the current context does not yet
>> exist that must be instantiated too, so creating a closure usually takes
>> two allocations.
>> Clean blocks are activated like blocks.  Block and method activation is
>> different in the first phase (the send side) but quite similar in the
>> second phase (frame building).  In VW for example, finding the machine code
>> method associated with a block involves a cache lookup which can be slow.
>>  In Cog, it involves following a pointer in the method header (inside, the
>> VM replaces the header of a method with a pointer to its machine code) and
>> then jumping to a hard-coded binary search which jumps to the correct
>> block's entry-point depending on the closure's startpc.  If a method
>> contains a single block then this is a direct jump.  As a result, block
>> dispatch in Cog is typically faster than in VW.
>> Yes, it is possible to implement clean blocks.  It is only an issue to do
>> with the representation of closures.  Ideally they need a method inst var,
>> making the outerContext inst var optional (or at least nil in a clean
>> block).  But that would require a change to BlockClosure's class definition
>> and a VM change.  To avoid having to change the class definition of
>> BlockClosure and the VM, the compiler could create an empty context to hold
>> onto the method, and that would work fine.  So to implement clean blocks
>> the compiler would instantiate a BlockClosure literal for each clean block
>> and a MethodContext whose receiver was nil shared between all the clean
>> blocks in a method.  There are tricky issues such as setting breakpoints in
>> methods (toggle break on entry), or copying methods, which would require
>> scanning the literals for clean blocks and duplicating them and their
>> outerCOntext too.  But that's just detail.  Some time I must try this for
>> Squeak.  Let me know if you try if=t for Opal.  (and of course I'm very
>> happy to help with advice).
>> I expect that in certain cases the speedup would be noticeable, but it is
>> a micro-optimization.  You'd of course only notice the difference in tight
>> loops that used clean blocks.
>> 2013/7/30 Eliot Miranda <eliot.miranda at gmail.com>
>>>> http://www.mirandabanda.org/cogblog/2008/06/07/closures-part-i/
>>>> Hi Clément,
>>>> On Mon, Jul 29, 2013 at 1:54 AM, Clément Bera <bera.clement at gmail.com>wrote:
>>>>> Hello guys,
>>>>> I was looking recently at the blockClosure model of Eliot in
>>>>> Pharo/Squeak and the blockClosure model of VisualWorks and I have a few
>>>>> questions.
>>>>> - Why Pharo/Squeak does not have compiled block as in VW and has the
>>>>> block byte code in the enclosing method ? Is it to save memory ? Would it
>>>>> worth it to implement CompiledBlock in term of speed and memory consumption
>>>>> ?
>>>> Squeak derives directly from the "blue book" Smalltalk-80
>>>> implementation in which CompiledMethod is a hybrid object, half pointers
>>>> (method header and literals) and half bytes (bytecode and source pointer).
>>>>  This format was chosen to save space in the original 16-bit Smalltalk
>>>> implementations on the Xerox D machines (Alto & Dorado).  VisualWorks has a
>>>> few extra steps in between,  In ObjectWorks 2.4 and ObjectWorks 2.5 Peter
>>>> Deutsch both introduced closures and eliminated the hybrid CompiledMethod
>>>> format, introducing CompiledBlock.
>>>> IMO adding CompiledBlock, while simplifying the VM a little would not
>>>> improve performance, especially in the interpreter, essentially because
>>>> activating and retuning form methods now requires an ecxtra level of
>>>> indirection to get from the CompiledMethod object to its bytecodes in its
>>>> bytecode object.
>>>> However, adding CompiledBlock (or rather eliminating the hybrid
>>>> CompiledMethod format) would definitely *not* save space.  The hybrid
>>>> format is more compact (one less object per method).  One can try and
>>>> improve this as in VisualWorks by encoding the bytecodes of certain methods
>>>> as SmallIntegers in the literal frame, but this is only feasible in a pure
>>>> JIT VM.  Squeak still has an interpreter, and Cog is a hybrid JIT and
>>>> Interpreter.  In an interpreter it is costly in performance to be able to
>>>> interpret this additional form of bytecodes.
>>>> So IMO while the hybrid CompiledMethod isn't ideal it is acceptable,
>>>> having important advantages to go along with its disadvantages.
>>>>  - Why Pharo/Squeak context have this variable closureOrNil instead of
>>>>> having the closure in the receiver field as in VW ? Is it an optimization
>>>>> because there are a lot of access to self and instance variables in the
>>>>> blocks in Pharo/Squeak ? Because if I'm correct it uses 1 more slot per
>>>>> stack frame to have this.
>>>> I did this because I think its simpler and more direct.  I don't like
>>>> VW's access to the receiver and inst vars having to use different bytecodes
>>>> within a block to within a method.  There are lots of complexities
>>>> resulting from this (e.g. in scanning code for inst var refs, the
>>>> decompiler, etc).
>>>> But in fact there isn't really an additional stack slot because the
>>>> frame format in the VM does not use the stacked receiver (the 0'th
>>>> argument) as accessing the receiver in this position requires knowing the
>>>> method's argument count.  So in both methods and blocks the receiver is
>>>> pushed on the stack immediately before allocating space for, and nilling,
>>>> any temporaries.  This puts the receiver in a known place relative to the
>>>> frame pointer, making it accessible to the bytecodes without having to know
>>>> the method's argument count.  So the receiver always occurs twice on the
>>>> stack in a method anyway.  In a block, the block is on the stack in the
>>>> 0'th argument position.  The actual receiver is pushed after the temps.
>>>> - Lastly, does VW have the tempVector optimization for escaping write
>>>>> temporaries in their blockClosure ? It seems they have not (I don't see any
>>>>> reference to it in VW 7). Did Pharo/Squeak blocks earns a lot of speed or
>>>>> memory with this optimization ?
>>>> Yes, VW has this same organization.  I implemented it in VisualWorks 5i
>>>> in ~ 2000.  It resulted in a significant increase in performance (for
>>>> example, factors of two improvement in block-intensive code such as
>>>> exception handling).  This is because of details in the context-to-stack
>>>> mapping machinery which mean that if an activation of a closure can update
>>>> the temporaries of its outer contexts then keeping contexts and stack
>>>> frames in sync is much more complex and costly.  The 5i/Cog organization
>>>> (which in fact derives from some Lisp implementations) results in much
>>>> simpler context-to0stack mapping such that no tests need be done when
>>>> returning from a method to keep frames and contexts in sync.
>>>>> Thank you for any answer.
>>>> You're most welcome.  Have you read my blog post on the design?  It is "Under
>>>> Cover Contexts and the Big Frame-Up<http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>",
>>>> with additional information in "Closures Part I" & "Closures Part II –
>>>> the Bytecodes<http://www.mirandabanda.org/cogblog/2008/07/22/closures-part-ii-the-bytecodes/>
>>>> ".
>>>> --
>>>> best,
>>>> Eliot
>> --
>> best,
>> Eliot

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20130801/5444d448/attachment.htm

More information about the Vm-dev mailing list