Comments on Smalltalk block closure designs, part 1
Allen Wirfs-Brock
Allen_Wirfs-Brock at Instantiations.com
Wed Apr 25 06:21:27 UTC 2001
(This is a response to a design for Squeak block closures developed by
Stephan Rudlof, "Draft for Block Closure Semantics for Squeak
v0.6.0". With any luck this will be the first of a serval part response)
Allen_Wirfs-Brock at instantiations.com
Stephan,
I though I would share some of my thoughts based upon my experiences having
designed the block closure implementations for several Smalltalks,
including Smalltalk/V. Hopefully, you may find some useful ideas that you
can use in your project.
Instead of simply critiquing your design, I want to step back and give you
my "big picture" view of block closure issues. My experience is that there
are several fairly orthogonal architectural issues which require solutions
in order to successfully implement block closures. A subset set of these
issues are addressed in the "classic" Smalltalk-80
MethodContext/BlockContext design but the factoring of the design solutions
to these issues in the Smalltalk-80 design is not necessary optimal for
supporting block closures. For this reason, I have generally found it
useful to start a design by "going back to basics" rather than focusing on
how to extend MethodContext/BlockContext.
So what are the principal architectural elements required to support
Smalltalk method evaluation and block closure semantics? Here's my list
and some relevant comments:
1) Activation Records -- An activation record records the transient state
of a "function" activation. In Smalltalk this means a method activation or
a block evaluation. The transient state includes control information
(return address, etc.), temporary variables and arguments, etc. In
principal, the control information and the data portions of an activation
could be separated but in practice this usually isn't necessary. There is
little significant difference between the activation record requirements of
a Smalltalk method and a Smalltalk block. It is quite possible to design an
activation record format that will work for either.
In the classic Smalltalk-80 design, MethodContexts and BlockContexts both
serve as activation records. MethodContexts are a relatively "pure"
implementation of the activation record abstraction encapsulating control
information, arguments, temporary variables. BlockContexts are hybrids that
encapsulate the control data of an activation record but depend upon an
associated method context for storing arguments and temporary
variables. BlockContexts also serve as the "function" object that
corresponds to an evaluable block. This leads to various problems in the
classic design which we will discuss in a moment.
Traditional block structured languages stack allocated activation
records. This is primarily done for efficiency of allocation and
deallocation and is possible because the semantics of such languages
typically restrict variable accesses to following the stack like model. The
classic Smalltalk-80 model uses heap allocated activation records
(MethodContexts/BlockContexts) however, high performance interpreters and
particularly JITs try to avoid creating heap allocated activation records
and instead typically try to use fairly conventional stack allocated
activation records. This is possible because most activation records are
never directly referenced (via an oop) as an object. If a program actually
needs to capture an object reference to an activation record (for example,
via a thiscontext reference) then the activation record (and typically all
preceding activation records) must be "reified" as heap allocated
objects. This can be a very expensive operation and any design that
requires frequent reification of the stack will probably not perform well.
Fortunately, it is possible to have a Smalltalk activation/block closure
architecture that NEVER needs to reify that stack! (you'll have to keep
reading if you want to know the trick that enables this)
2) Environments -- You can think of an "environment" as the data portion of
an activation record. The associated method is evaluated in the context of
an environment and multiple activations (e.g. recursions) of a given method
may be simultaneously active and bound to different environments. The lisp
community uses terminology (coming from the Lambda calculus??) that says
that when a "function" is associated with a particular environment the
function is "closed" with respect to that environment and that any free
(alternatively open) variables in the function are bound to their
corresponding entries in the environment. Hence the origin of the term
"closure". (a free variable is one whose definition is not part of the
function in question) A closure is a function, bound to an environment. In
Smalltalk, the act of evaluating an expression in square brackets creates a
closure by binding the code (in particularly any free variables) within the
brackets (the "function") to an environment (that of the enclosing block or
method). An alternative terminology that is sometimes used would say that
the environment (and its variables) are "captured" by the closure.
[A terminology note..."closure" is a highly technical term that has little
meaning to most programmers. As closure semantics were added to
Smalltalk-80 the term block closure was introduced and made manifest. It
think it was a mistake to force this terminology on Smalltalk programmers
who aren't computer scientists. I prefer to call square brackets enclosing
some Smalltalk statements a "block constructor" and to use the term "block"
(or "block object") to identify the type of object that is created by
evaluating a block constructor. I think this is simpler terminology and
most closely matches how programmers think about the constructs. This is
the terminology we used in writing the ANSI Smalltalk specification.]
Because blocks are first class Smalltalk objects they can be stored in any
field or variable including those that have a longer lifetime then the
activation record that is associated with the environment that is bound to
the block. This means that an environment that can be potentially captured
using a block constructor can not be managed via stack allocation. The
simplest way around this problem would be to allocate all arguments and
temporary variables of every activation record in a heap allocated
environment object. (Each activation record would have a field that points
to its associated environment object) This, however, would have
performance characteristics very similar to exclusively using heap
allocated activation records. However, it is easy to observe that the
majority of arguments and temporary variables used in a program can never
be captured by a block constructor. Any variable which we (or a compiler)
can prove is only referenced by its defining method or block (and hence are
not referenced from an nested block constructors) can be safely stack
allocated in the activation record. In practice, most method contain no
block constructors (for this purpose we don't consider in-lined blocks as
true block constructors) and hence all their arguments and temporaries can
be safely stack allocated. Typically, for those methods that do contain
real block constructors, only a small subset of the arguments and
variables are actually subject to capture and only those variables need to
placed in a heap allocated environment object. The number of environment
allocated variables can be further reduced by paying attention to "read
only variables". Any variable that can never be assigned during the
lifetime of a blocks,for example, arguments of enclosing methods and blocks
need not be placed in the environment if instead a copy of the variable is
placed in the closure when it is created (is this true of Squeak?? Are
arguments read only? They should be for this reason)
3)Closures (or preferably, "block objects") -- A Block object captures an
association between some executable code and an environment. A block object
can be evaluated much like a function. This results in the creation of an
activation record for the evaluation. The classic Smalltalk-80 design used
the same object (a BlockContext) for both the "closure" and the activation
record. Because of this, Smalltalk-80 blocks were not reentrant (and hence
could not be recursively invoked) as there was only a single "return
address slot", stack, etc. that gets over-written by each activation of the
block. Blocks (closures) and activation records have distinct roles and
need to be represented by distinct objects.
In its simplest form, a block object needs only two slots, a reference to
the code and a reference to the captured environment. If the environment
that is capture by a block is represented directly in an activation record
(this is essentially what happens via the home context reference of a
Smalltalk-80 block context) then the activation record (and usually the
entire stack) has to be reified. The undesirability of stack reification is
a strong incentive to explicitly represent environments that are subject to
capture as heap objects. In addition to the environment and code slots,
blocks that contain up arrow returns may (depending upon specifics of the
overall design) need a continuation slot. (If the captured continuation
slot isn't in the block object then it probably needs to be in an
environment object.)
4) Continuations - You can think of a continuation as the portion of an
activation record that contains the return address and associated
state. It specifies where execution will "continue" when control exits the
code associated with the activation record. All methods and most blocks
continue execution, upon return, in the activation that directly called
them. In this case, control flow follows a stack discipline and the
continuation linkage between activation record can be modeled implicitly
with an activation record stack. The exception to this rule in Smalltalk is
the up arrow return from a block. An up arrow return uses the continuation
of the method activation that created (perhaps indirectly via intermediate
block activations) the block object associated with the current activation
rather than the current (block) activation record's continuation.
[Some languages, most notably Scheme support "first class
continuations". In such languages, continuations are "real objects" that
can be directly manipulated. In particular, a Scheme continuation can be
"invoked" multiple times. This essentially means that control can
redirected to return from a particular function activation again, even if
that activation has already been returned from. This permits various novel
control paradigms but significantly complicates the implementation of
continuations. Smalltalk-80 does not have this characteristic. A
Smalltalk method continuation can be used only once and most Smalltalk
implementation do not allow the use of continuations that cross process
boundaries. These restrictions allow for optimizations that would
otherwise be difficult to achieve.]
As with a block's environment record, the method continuation must be
captured when the block object is created. If the captured continuation is
represented as a reference to the method's activation record, this will
also cause reification of the activation record and the stack which we
would like to avoid.
I've now covered enough to explain how Smalltalk blocks with full closure
semantics can be implemented an a way that never forces reification of the
activation record stack:
Allocate all arguments and temps that are not "captured" by blocks within
stack allocated activation records.
All variables that are subject to capture by blocks are explicitly
allocated in heap allocated environment objects
(or, if "read-only", copied into the block object).
(only activations that have captured variables need allocate environment
objects, except for the following rule)
The activation of a method that contains blocks with up arrow returns
always allocates a heap environment object.
(it may be empty, in the sense that it contains no variable, but it must
be allocated)
Activation records reference ("point to") environment objects but
environment objects never point to activation records.
Environment objects and block objects may reference other environment objects.
A block (a closure) object references its enclosing environment(s) (but
never the enclosing activation record(s))
A block object that implements an up arrow return captures a reference to
the environment associated with its method activation.
(This is used as the continuation marker)
An up arrow return is implemented by searching up the activation stack for
an activation record whose environment reference is the same as the
continuation marker of the returning bock. The activation stack is
trimmed to that activation record and a return is forced.
(If no activation record with that environment is found we have a
#cannotReturn: situation)
Enough for part #1, with luck I'll still be motivated to keep writing on
part #2 tomorrow
Allen
More information about the Squeak-dev
mailing list
|