Comments on Smalltalk block closure designs, part 1

Wed Apr 25 12:05:35 UTC 2001

Yes, please, continue. I (for one) would like to read more...

On 25/04/01 8:21, "Allen Wirfs-Brock" <Allen_Wirfs-Brock at Instantiations.com>
wrote:

> (This is a response to a design for Squeak block closures developed by
> Stephan Rudlof, "Draft for Block Closure Semantics for Squeak
> v0.6.0".  With any luck this will be the first of a serval part response)
> 
> Allen_Wirfs-Brock at instantiations.com
> 
> 
> 
> Stephan,
> 
> I though I would share some of my thoughts based upon my experiences having
> designed the block closure implementations for several Smalltalks,
> including Smalltalk/V.  Hopefully, you may find some useful ideas that you
> can use in your project.
> 
> Instead of simply critiquing your design, I want to step back and give you
> my "big picture" view of block closure issues. My experience is that there
> are several fairly orthogonal architectural issues which require solutions
> in order to successfully implement block closures. A subset set of these
> issues are addressed in the "classic" Smalltalk-80
> MethodContext/BlockContext design but the factoring of the design solutions
> to these issues in the Smalltalk-80 design is not necessary optimal for
> supporting block closures. For this reason, I have generally found it
> useful to start a design by "going back to basics" rather than focusing on
> how to extend MethodContext/BlockContext.
> 
> So what are the principal architectural elements required to support
> Smalltalk method evaluation and block closure semantics?  Here's my list
> and some relevant comments:
> 
> 1) Activation Records -- An activation record records the transient state
> of a "function" activation.  In Smalltalk this means a method activation or
> a block evaluation. The transient state includes control information
> (return address, etc.), temporary variables and arguments, etc. In
> principal, the control information and the data portions of an activation
> could be separated but in practice this usually isn't necessary.  There is
> little significant difference between the activation record requirements of
> a Smalltalk method and a Smalltalk block. It is quite possible to design an
> activation record format that will work for either.
> 
> In the classic Smalltalk-80 design, MethodContexts and BlockContexts both
> serve as activation records.  MethodContexts are a relatively "pure"
> implementation of the activation record abstraction encapsulating control
> information, arguments, temporary variables. BlockContexts are hybrids that
> encapsulate the control data of an activation record but depend upon an
> associated method context for storing arguments and temporary
> variables.  BlockContexts also serve as the "function" object that
> corresponds to an evaluable block.  This leads to various problems in the
> classic design which we will discuss in a moment.
> 
> Traditional block structured languages stack allocated activation
> records.  This is primarily done for efficiency of allocation and
> deallocation and is possible because the semantics of such languages
> typically restrict variable accesses to following the stack like model. The
> classic Smalltalk-80 model uses heap allocated activation records
> (MethodContexts/BlockContexts) however, high performance interpreters and
> particularly JITs try to avoid creating heap allocated activation records
> and instead typically try to use fairly conventional stack allocated
> activation records. This is possible because most activation records are
> never directly referenced (via an oop) as an object. If a program actually
> needs to capture an object reference to an activation record (for example,
> via a thiscontext reference) then the activation record (and typically all
> preceding activation records) must be "reified" as heap allocated
> objects.  This can be a very expensive operation and any design that
> requires frequent reification of the stack will probably not perform well.
> Fortunately, it is possible to have a Smalltalk activation/block closure
> architecture that NEVER needs to reify that stack! (you'll have to keep
> reading if you want to know the trick that enables this)
> 
> 2) Environments -- You can think of an "environment" as the data portion of
> an activation record. The associated method is evaluated in the context of
> an environment and multiple activations (e.g. recursions) of a given method
> may be simultaneously active and bound to different environments. The lisp
> community uses terminology (coming from the Lambda calculus??) that says
> that when a "function" is associated with a particular environment the
> function is "closed" with respect to that environment and that any free
> (alternatively open) variables in the function are bound to their
> corresponding entries in the environment. Hence the origin of the term
> "closure". (a free variable is one whose definition is not part of the
> function in question) A closure is a function, bound to an environment. In
> Smalltalk, the act of evaluating an expression in square brackets creates a
> closure by binding the code (in particularly any free variables) within the
> brackets (the "function") to an environment (that of the enclosing block or
> method). An alternative terminology that is sometimes used would say that
> the environment (and its variables) are "captured" by the closure.
> 
> [A terminology note..."closure" is a highly technical term that has little
> meaning to most programmers. As closure semantics were added to
> Smalltalk-80 the term block closure was introduced and made manifest.  It
> think it was a mistake to force this terminology on Smalltalk programmers
> who aren't computer scientists. I prefer to call square brackets enclosing
> some Smalltalk statements a "block constructor" and to use the term "block"
> (or "block object") to identify the type of object that is created by
> evaluating a block constructor. I think this is simpler terminology and
> most closely matches how programmers think about the constructs. This is
> the terminology we used in writing the ANSI Smalltalk specification.]
> 
> Because blocks are first class Smalltalk objects they can be stored in any
> field or variable including those that have a longer lifetime then the
> activation record that is associated with the environment that is bound to
> the block. This means that an environment that can be potentially captured
> using a block constructor can not be managed via stack allocation.  The
> simplest way around this problem would be to allocate all arguments and
> temporary variables of every activation record in a heap allocated
> environment object. (Each activation record would have a field that points
> to its associated environment object)  This, however, would have
> performance characteristics very similar to exclusively using heap
> allocated activation records. However, it is easy to observe that the
> majority of arguments and temporary variables used in a program can never
> be captured by a block constructor. Any variable which we (or a compiler)
> can prove is only referenced by its defining method or block (and hence are
> not referenced from an nested block constructors) can be safely stack
> allocated in the activation record. In practice, most method contain no
> block constructors (for this purpose we don't consider in-lined blocks as
> true block constructors) and hence all their arguments and temporaries can
> be safely stack allocated.  Typically, for those methods that do contain
> real block constructors,  only a small subset of the arguments and
> variables are actually subject to capture and only those variables need to
> placed in a heap allocated environment object.  The number of environment
> allocated variables can be further reduced by paying attention to "read
> only variables".  Any variable that can never be assigned during the
> lifetime of a blocks,for example, arguments of enclosing methods and blocks
> need not be placed in the environment if instead a copy of the variable is
> placed in the closure when it is created (is this true of Squeak?? Are
> arguments read only? They should be for this reason)
> 
> 3)Closures (or preferably, "block objects") -- A Block object captures an
> association between some executable code and an environment. A block object
> can be evaluated much like a function.  This results in the creation of an
> activation record for the evaluation. The classic Smalltalk-80 design used
> the same object (a BlockContext) for both the "closure" and the activation
> record.  Because of this, Smalltalk-80 blocks were not reentrant (and hence
> could not be recursively invoked) as there was only a single "return
> address slot", stack, etc. that gets over-written by each activation of the
> block.  Blocks (closures) and activation records have distinct roles and
> need to be represented  by distinct objects.
> 
> In its simplest form, a block object needs only two slots, a reference to
> the code and a reference to the captured environment. If the environment
> that is capture by a block is represented directly in an activation record
> (this is essentially what happens via the home context reference of a
> Smalltalk-80 block context) then the activation record (and usually the
> entire stack) has to be reified. The undesirability of stack reification is
> a strong incentive to explicitly represent environments that are subject to
> capture as heap objects. In addition to the environment and code slots,
> blocks that contain up arrow returns may (depending upon specifics of the
> overall design) need a continuation slot. (If the captured continuation
> slot isn't in the block object then it probably needs to be in an
> environment object.)
> 
> 4) Continuations - You can think of a continuation as the portion of an
> activation record that contains the return address and associated
> state.  It specifies where execution will "continue" when control exits the
> code associated with the activation record. All methods and most blocks
> continue execution, upon return, in the activation that directly called
> them.  In this case, control flow follows a stack discipline and the
> continuation linkage between activation record can be modeled implicitly
> with an activation record stack. The exception to this rule in Smalltalk is
> the up arrow return from a block. An up arrow return uses the continuation
> of the method activation that created (perhaps indirectly via intermediate
> block activations) the block object associated with the current activation
> rather than the current (block) activation record's continuation.
> 
> [Some languages, most notably Scheme support "first class
> continuations".  In such languages, continuations are "real objects" that
> can be directly manipulated.  In particular, a Scheme continuation can be
> "invoked" multiple times.  This essentially means that control can
> redirected to return from a particular function activation again, even if
> that activation has already been returned from.  This permits various novel
> control paradigms but significantly complicates the implementation of
> continuations.  Smalltalk-80 does not have this characteristic.  A
> Smalltalk method continuation can be used only once and most Smalltalk
> implementation do not allow the use of continuations that cross process
> boundaries.  These restrictions allow for optimizations that would
> otherwise be difficult to achieve.]
> 
> As with a block's environment record, the method continuation must be
> captured when the block object is created. If the captured continuation is
> represented as a reference to the method's activation record, this will
> also cause reification of the activation record and the stack which we
> would like to avoid.
> 
> I've now covered enough to explain how Smalltalk blocks with full closure
> semantics can be implemented an a way that never forces reification of the
> activation record stack:
> 
> Allocate all arguments and temps that are not "captured" by blocks within
> stack allocated activation records.
> All variables that are subject to capture by blocks are explicitly
> allocated in heap allocated environment objects
> (or, if "read-only", copied into the block object).
> (only activations that have captured variables need allocate environment
> objects, except for the following rule)
> The activation of a method that contains blocks with up arrow returns
> always allocates a heap environment object.
> (it may be empty, in the sense that it contains no variable, but it must
> be allocated)
> Activation records reference ("point to") environment objects but
> environment objects never point to activation records.
> Environment objects and block objects may reference other environment objects.
> A block (a closure) object references its enclosing environment(s) (but
> never the enclosing activation record(s))
> A block object that implements an up arrow return captures a reference to
> the environment associated with its method activation.
> (This is used as the continuation marker)
> An up arrow return is implemented by searching up the activation stack for
> an activation record whose environment reference is the same as the
> continuation marker of the returning bock.  The activation stack is
> trimmed to that activation record and a return is forced.
> (If no activation record with that environment is found we have a
> #cannotReturn: situation)
> 
> Enough for part #1, with luck I'll still be motivated to keep writing on
> part #2 tomorrow
> 
> Allen        
> 
> 

--
Roel Wuyts                           Software Composition Group
Roel.Wuyts at iam.unibe.ch         University of Bern, Switzerland
Board member of the European Smalltalk User Group: www.esug.org