[squeak-dev] Two new curious Context/primitive questions :-)
vanessa at codefrau.net
Wed Jan 6 02:05:20 UTC 2021
This was a very interesting read - thank you Eliot! That should go in some
Another overview is in the class comment for
MaybeContextInstanceVariableNode. Which I found by trying to figure out how
the heck the compiler knows to use the long form for inst var refs in
On Fri, Jan 1, 2021 at 1:23 PM Eliot Miranda <eliot.miranda at gmail.com>
> Hi Christoph,
> happy new year!!
> On Fri, Jan 1, 2021 at 11:28 AM Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>> Hi all, hi Eliot,
>> new year, new simulation fun, and I have collected two new questions
>> about the Context implementation which I'd love to get answered here:
>> First, I was confused by the following:
>> (BlockClosure >> #numArgs) primitive. "266"
>> (Context >> #pc) primitive. "0"
>> What makes Context so special that it cannot be compiled with quick
>> accessor methods?
> The gory details are here:
> But that's full of implementation detail. Let me try and explain at a
> higher level.
> Contexts are wonderful but very expensive. Any VM that actually creates
> contexts for every method or block activation will inevitably run slowly.
> There are two overheads here. One is the basic overhead of allocating and
> reclaiming the contexts that are created. The other is the cost of moving
> the outgoing receiver and arguments from the sending context to the new
> context. A fast VM must avoid these overheads, and the key solution is to
> use a stack frame organization as is used in conventional language
> implementations. The key here is that with stack frames the receiver and
> incoming arguments do not have to be moved; they stay where they are and a
> new frame is built underneath them which includes them as the incoming
> receiver and arguments; stack frames overlap outgoing and incoming
> But crucially, in addition the VM must still create contexts if and when
> they are needed. If not, this would no longer be a Smalltalk-80
> implementation. Now one could imagine a VM which converted stack frames to
> contexts whenever a method or block activation was accessed using
> thisContext (or thisContext sender etc which access contexts by following
> the sdender field). But that would be very slow. Instead, the fastest
> approach is to try and keep stack frames as stack frames for as long as
> possible. For this to work contexts have to be able to function as "proxy
> objects" for stack frames. The VM does indeed create contexts when one
> uses thisContext, or even when one creates a block, but internally the
> context is in a very special state, a state in which it points at a stack
> Some terminology is useful. Peter Deutsch used "volatile", "hybrid" and
> "stable". I like "single", "married", "widowed" and "divorced" (Peter's
> scheme didn't have "widowed" which is the key to why my scheme is faster,
> and why the closure implementation is as it is).
> When a stack frame is created it is "single". It has a slot within it to
> refer to a context object if it is needed, but the slot is nil.
> When a stack frame needs a context object one is created and the two are
> When a method or block returns its stack frame is discarded; the stack
> frame has died and if it was married its context becomes "widowed". (*)
> When space is needed for new stack frames all reclaimed stack frames are
> "divorced" from their contexts. The stack frame state is written to the
> context object and the context object no longer acts as a proxy; it is a
> fully fledged normal stack object. (*)
> (*) Stack frames exist in a small memory zone organized as stack pages,
> the stack zone. The size of the stack zone is set at startup, and its size
> is either the VM's defaut oir set by a value that persists in the image
> header (see vmParameterAt: 43). Since the system may run out of stack
> pages (a deep call stack, many active processes), the VM makes new pages
> available by purging the least recently used pages to the heap in the form
> of context objects. All the frames in the least recently used stack page
> are divorced at the same time. A context must be created for each frame
> that is not yet married.
> (*) The reason why Peter's original scheme is slower is that at return
> time the VM has to check for being married and copy state to the hybrid
> context, making it stable. This has to be done because a closure's outer
> temporary variables are stored in the lexically enclosing stack frame, and
> so have to be made to persist in the stable context or their values will be
> stale. In my (lisp-derived) closure implementation there are no such
> references to outer temporary variables; values are either copied if they
> do not change, or stored in an Array (the "indirection vector"), which can
> then be copied.
> Note that divorce is a process. Once divorced a context is single and
> free to function as a context does. Indeed when the system starts up it
> loads an image which only contains single contexts. All contexts are
> divorced on snapshot.
> So now, finally, we can answer your question. Why is (Context>>#pc), a
> method that simply answers an instance variable, not compiled as a quick
> method with a primitive that answers that variable? "married" and
> "widowed" contexts are proxies for stack frames. Only the instance
> variables they contain which do not change can be fetched from the
> context object itself; these are the receiver, arguments, closureOrNil, and
> method. sender, pc, stackp and other stack contents may change as execution
> proceeds. These mutable values must be fetched from the stack frame itself,
> if it still exists.
> All accesses to sender, pc, stackp, and stack contents
> (Context>>at:[put:]) must be mediated. A married or widowed context has
> its sender field set to the frame pointer of its stack frame, encoded as a
> SmallInteger. You can never see this from the image (except via a special
> xRay primitive). By following the frame pointer encoded in the sender field
> the VM can detect if the frame is still live. If so, the relevant
> values are synthesized or fetched from the stack frame itself. If not, the
> context is widowed and "mourns"; it is changed into a context with a nil
> sender field, a nil pc, and a stackp that points at the last argument.
> How is this mediation done? One design could be to require that all image
> level accesses to Context inst vars are through primitives. This feels
> contrived to me. Instead, inst var access is via inst var access
> bytcodes. But if we do this naively we would have to check in all inst var
> access bytecodes whose index is the same as sender (0), pc (1) & stackp
> (2). This would slow down inst var access enormously. Instead we observe
> that inst var access bytecodes pushReceiverVariable:,
> popIntoReceiverVariable: have short forms for the most frequently used
> (indexes 0 to 15 for push, indexes 0 to 7 for popInto). By arranging that
> we only use the long forms for accessing inst vars in Context, and
> superclasses and subclasses, we only have to check in the long forms and
> the check is cheap. We check that the index is <= 2, because all other
> inst var accesses with these indices will use the short-form bytecodes.
> By arranging that Context's inst vars are *not* accessed by quick
> primitives we do not have to duplicate this machinery in quick primitive
> dispatch, which would also slow things down. Hence Context>>pc is compiled
> as a vanilla method and its inst var access bytecode is the long form:
> 25 <E2 01> pushRcvr: 1
> 27 <5C> returnTop
> 'Nuff said
>> Second, Context >> #isPrimFailToken: attracted my attention multiple
>> times when looking at different #timeProfile results of simulation
>> sessions. In the expression [100000 factorial] it takes up more than 44% of
>> the whole execution time!
>> I have no idea why it could be so slow, but this message is sent really
>> often, which is at least 2 times per simulation of a special message send.
>> Would there be an easy change to resolve this bottleneck and speed
>> up the simulation by 40% or more?
>> Would it be possible to provide a primitive for this method? Or do you
>> see any other way to optimize it?
> It's not slow, so much as it has to be used all the time to check whether
> a send has invoked a primitive which has failed. Its function is to
> support primitive error codes. Without it we can simply check for nil.
> But with it we need a container that both reliably distinguishes a
> primitive failure result from all other objects in the system, and holds
> onto the primitive failure code. It may indeed be able to reduce the
> overhead by only testing when absolutely necessary. It may be that
> reimplementing the scheme is faster.
> For example, one could imagine a class whose instances are the only
> objects to answer #isPrimFailToken, which would be faster than a one
> argument message send. But beware, because if one was debugging the
> debugger one mustn't confuse a PrimFailToken created in the simulation for
> a PrimFailToken created by the simulation.
> Remember to distinguish between price and value. Implementing the
> context-to-stack mapping scheme outlined above was very costly, but its
> value is very high. It allows us to have contexts, with all their utility,
> while reducing their costs significantly. Similarly, and closely related,
> being able to simulate execution is costly but hugely valuable. If the
> price is 40% of simulated execution then it is a price worth paying.
> Happy New Year!
> best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Squeak-dev