[Vm-dev] Eliot's BlockClosure model questions

Eliot Miranda eliot.miranda at gmail.com
Mon Jul 29 23:05:21 UTC 2013

http://www.mirandabanda.org/cogblog/2008/06/07/closures-part-i/Hi Clément,

On Mon, Jul 29, 2013 at 1:54 AM, Clément Bera <bera.clement at gmail.com>wrote:

> Hello guys,
> I was looking recently at the blockClosure model of Eliot in Pharo/Squeak
> and the blockClosure model of VisualWorks and I have a few questions.
> - Why Pharo/Squeak does not have compiled block as in VW and has the block
> byte code in the enclosing method ? Is it to save memory ? Would it worth
> it to implement CompiledBlock in term of speed and memory consumption ?

Squeak derives directly from the "blue book" Smalltalk-80 implementation in
which CompiledMethod is a hybrid object, half pointers (method header and
literals) and half bytes (bytecode and source pointer).  This format was
chosen to save space in the original 16-bit Smalltalk implementations on
the Xerox D machines (Alto & Dorado).  VisualWorks has a few extra steps in
between,  In ObjectWorks 2.4 and ObjectWorks 2.5 Peter Deutsch both
introduced closures and eliminated the hybrid CompiledMethod format,
introducing CompiledBlock.

IMO adding CompiledBlock, while simplifying the VM a little would not
improve performance, especially in the interpreter, essentially because
activating and retuning form methods now requires an ecxtra level of
indirection to get from the CompiledMethod object to its bytecodes in its
bytecode object.

However, adding CompiledBlock (or rather eliminating the hybrid
CompiledMethod format) would definitely *not* save space.  The hybrid
format is more compact (one less object per method).  One can try and
improve this as in VisualWorks by encoding the bytecodes of certain methods
as SmallIntegers in the literal frame, but this is only feasible in a pure
JIT VM.  Squeak still has an interpreter, and Cog is a hybrid JIT and
Interpreter.  In an interpreter it is costly in performance to be able to
interpret this additional form of bytecodes.

So IMO while the hybrid CompiledMethod isn't ideal it is acceptable, having
important advantages to go along with its disadvantages.

- Why Pharo/Squeak context have this variable closureOrNil instead of
> having the closure in the receiver field as in VW ? Is it an optimization
> because there are a lot of access to self and instance variables in the
> blocks in Pharo/Squeak ? Because if I'm correct it uses 1 more slot per
> stack frame to have this.

I did this because I think its simpler and more direct.  I don't like VW's
access to the receiver and inst vars having to use different bytecodes
within a block to within a method.  There are lots of complexities
resulting from this (e.g. in scanning code for inst var refs, the
decompiler, etc).

But in fact there isn't really an additional stack slot because the frame
format in the VM does not use the stacked receiver (the 0'th argument) as
accessing the receiver in this position requires knowing the method's
argument count.  So in both methods and blocks the receiver is pushed on
the stack immediately before allocating space for, and nilling, any
temporaries.  This puts the receiver in a known place relative to the frame
pointer, making it accessible to the bytecodes without having to know the
method's argument count.  So the receiver always occurs twice on the stack
in a method anyway.  In a block, the block is on the stack in the 0'th
argument position.  The actual receiver is pushed after the temps.

- Lastly, does VW have the tempVector optimization for escaping write
> temporaries in their blockClosure ? It seems they have not (I don't see any
> reference to it in VW 7). Did Pharo/Squeak blocks earns a lot of speed or
> memory with this optimization ?

Yes, VW has this same organization.  I implemented it in VisualWorks 5i in
~ 2000.  It resulted in a significant increase in performance (for example,
factors of two improvement in block-intensive code such as exception
handling).  This is because of details in the context-to-stack mapping
machinery which mean that if an activation of a closure can update the
temporaries of its outer contexts then keeping contexts and stack frames in
sync is much more complex and costly.  The 5i/Cog organization (which in
fact derives from some Lisp implementations) results in much simpler
context-to0stack mapping such that no tests need be done when returning
from a method to keep frames and contexts in sync.

> Thank you for any answer.

You're most welcome.  Have you read my blog post on the design?  It is "Under
Cover Contexts and the Big
with additional information in "Closures Part I" & "Closures Part II – the
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20130729/4e38a852/attachment.htm

More information about the Vm-dev mailing list