[squeak-dev] Two new curious Context/primitive questions :-)

Eliot Miranda eliot.miranda at gmail.com
Sat Jan 2 01:45:53 UTC 2021


Hi Christoph,

On Fri, Jan 1, 2021 at 4:18 PM Thiede, Christoph <
Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:

> Hi Eliot,
>
>
> once again I am very pleasantly surprised by your professional and
> detailed answer which was a joy to read!
> <http://www.hpi.de/>
>
> > 'Nuff said
>
> No further questions. This was very interesting!
> It also gives me a possible explanation for why one cannot execute a
> context subinstance without any problems. That's because the marriage
> between stack frame and context objects is hard-coded based on the Context
> class (specialObjects at: 11), isn't it?
>

Indeed. One may be able to go the other way, create a stack of contexts
whose class is different.  These should retain their class because the
associated stack frames will be married to them.  But going the other way
is not supported.

> > But beware, because if one was debugging the debugger one mustn't
> confuse a PrimFailToken created in the simulation for a PrimFailToken
> created by the simulation.
>
> Hm, aren't we already in a very similar situation? I had planned to write
> a separate message for this issue, but here is my case:
>
> {Context primitiveFailTokenFor: nil} at: 1. "{an Object . nil}"
> Context runSimulated: [{Context primitiveFailTokenFor: nil} at: 1].
> "⚡ Error: subscript is out of bounds: 1"
>
> Now one could argue that this is not a very likely situation to occur ever
> in "production", but the same could be said for a hypothetical #isPrimFailToken
> selector, and above all, it precludes an ideal, i.e. completely transparent
> simulation machinery ...
> So I would like to suggest to find a different solution for this problem
> anyway, provided that you support this idea in general.
>
> An alternative I could imagine would be the SetElement pattern from
> Collections which is completely transparent. But this would probably be too
> expensive for simulation purposes (because we would need to instantiate one
> object per simulated primitive call), right?
>

Not sure.  If instantiation can be deferred until failure then things
should be OK.

> Why not encode a failing primitive via a changed control flow? Couldn't we
> pass an additional failBlock argument to #
> doPrimitive:method:receiver:args:, #tryPrimitive:withArgs:/#tryNamedPrimitiveIn:for:withArgs:,
> and #simulateValueWithArguments:caller: (and analogously to
> ExternalFunction >> #tryInvokeWithArguments: and the critsect primitive
> methods on Mutex)?
>

We can, but remember that passing a block is in fact hiding an allocation,
maybe two.  Mentioning a block in code causes it to be allocated and if not
already done so, causes allocation of the lexically enclosing context.

> Then we could rewrite #send:to:with:lookupIn: like this (pseudo):
>
> send: selector to: rcvr with: arguments lookupIn: lookupClass
>     "Simulate the action of sending a message with selector and arguments
> to rcvr. The argument, lookupClass, is the class in which to lookup the
> message. This is the receiver's class for normal messages, but for super
> messages it will be some specific class related to the source method."
>
>     | meth primFailCode val ctxt |
>     (meth := lookupClass lookupSelector: selector) ifNil:
>         [selector == #doesNotUnderstand: ifTrue:
>             [self error: 'Recursive message not understood!' translated].
>         ^self send: #doesNotUnderstand:
>                 to: rcvr
>                 with: {(Message selector: selector arguments: arguments)
> lookupClass: lookupClass}
>                 lookupIn: lookupClass].
>
>     meth isCompiledMethod ifFalse:
>         ["Object as Methods (OaM) protocol: 'The contract is that, when
> the VM encounters an ordinary object (rather than a compiled method) in the
> method dictionary during lookup, it sends it the special selector
> #run:with:in: providing the original selector, arguments, and receiver.'.
> DOI: 10.1145/2991041.2991062."
>         ^self send: #run:with:in:
>             to: meth
>             with: {selector. arguments. rcvr}].
>
>     meth numArgs = arguments size ifFalse:
>         [^ self error: ('Wrong number of arguments in simulated message
> {1}' translated format: {selector})].
>     (primIndex := meth primitive) > 0 ifTrue:
> -         [val := self doPrimitive: primIndex method: meth receiver: rcvr
> args: arguments.
> -         (self isPrimFailToken: val) ifFalse:
> -             [^val]].
> +         [val := self doPrimitive: primIndex method: meth receiver: rcvr
> args: arguments ifFail:
> +             [:code | primFailCode := code].
> +         primFailCode ifNil: [^ val]].
>
>     (selector == #doesNotUnderstand: and: [lookupClass == ProtoObject])
> ifTrue:
>         [^self error: ('Simulated message {1} not understood' translated
> format: {arguments first selector})].
>
>     ctxt := Context sender: self receiver: rcvr method: meth arguments:
> arguments.
> -     (primIndex isInteger and: [primIndex > 0]) ifTrue:
> -         [ctxt failPrimitiveWith: val].
> +   primFailCode ifNotNil:
> +       [ctxt failPrimitiveWith: primFailCode].
>
>     ^ctxt
>
>
> Current senders of #primitiveFailTokenFor: could then evaluate the
> failBlock if the primitive fails.
> Do you think this approach would be worth a try? :-)
>

Definitely.  It might eliminate multiple tests.  I think you should
definitely try it.  It looks nice to me.  Thank you!

>
> Best,
> Christoph
> ------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> *Gesendet:* Freitag, 1. Januar 2021 22:34:28
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] Two new curious Context/primitive questions
> :-)
>
> errata:
>
> On Fri, Jan 1, 2021 at 1:22 PM Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> Hi Christoph,
>
>     happy new year!!
>
> On Fri, Jan 1, 2021 at 11:28 AM Thiede, Christoph <
> Christoph.Thiede at student.hpi.uni-potsdam.de> wrote:
>
>> Hi all, hi Eliot,
>>
>> new year, new simulation fun, and I have collected two new questions
>> about the Context implementation which I'd love to get answered here:
>>
>> First, I was confused by the following:
>> (BlockClosure >> #numArgs) primitive. "266"
>> (Context >> #pc) primitive. "0"
>>
>> What makes Context so special that it cannot be compiled with quick
>> accessor methods?
>>
>
> The gory details are here:
> http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/
>
> But that's full of implementation detail.  Let me try and explain at a
> higher level.
>
> Contexts are wonderful but very expensive.  Any VM that actually creates
> contexts for every method or block activation will inevitably run slowly.
> There are two overheads here.  One is the basic overhead of allocating and
> reclaiming the contexts that are created.  The other is the cost of moving
> the outgoing receiver and arguments from the sending context to the new
> context.  A fast VM must avoid these overheads, and the key solution is to
> use a stack frame organization as is used in conventional language
> implementations.  The key here is that with stack frames the receiver and
> incoming arguments do not have to be moved; they stay where they are and a
> new frame is built underneath them which includes them as the incoming
> receiver and arguments; stack frames overlap outgoing and incoming
> arguments.
>
> But crucially, in addition the VM must still create contexts if and when
> they are needed.  If not, this would no longer be a Smalltalk-80
> implementation.  Now one could imagine a VM which converted stack frames to
> contexts whenever a method or block activation was accessed using
> thisContext (or thisContext sender etc which access contexts by following
> the sender field).  But that would be very slow.  Instead, the fastest
> approach is to try and keep stack frames as stack frames for as long as
> possible.  For this to work contexts have to be able to function as "proxy
> objects" for stack frames.  The VM does indeed create contexts when one
> uses thisContext, or even when one creates a block, but internally the
> context is in a very special state, a state in which it points at a stack
> frame.
>
> Some terminology is useful.  Peter Deutsch used "volatile", "hybrid" and
> "stable".  I like "single", "married", "widowed" and "divorced" (Peter's
> scheme didn't have "widowed" which is the key to why my scheme is faster,
> and why the closure implementation is as it is).
>
> When a stack frame is created it is "single".  It has a slot within it to
> refer to a context object if it is needed, but the slot is nil.
> When a stack frame needs a context object one is created and the two are
> "married".
> When a method or block returns its stack frame is discarded; the stack
> frame has died and if it was married its context becomes "widowed". (*)
> When space is needed for new stack frames all reclaimed stack frames are
> "divorced" from their contexts.  The stack frame state is written to the
> context object and the context object no longer acts as a proxy; it is a
> fully fledged normal stack object. (*)
>
> (*) Stack frames exist in a small memory zone organized as stack pages,
> the stack zone.  The size of the stack zone is set at startup, and its size
> is either the VM's default or set by a value that persists in the image
> header (see vmParameterAt: 43).  Since the system may run out of stack
> pages (a deep call stack, many active processes), the VM makes new pages
> available by purging the least recently used pages to the heap in the form
> of context objects.  All the frames in the least recently used stack page
> are divorced at the same time. A context must be created for each frame
> that is not yet married. A page is big enough to hold on average about 40
> method activations; since activations are of different sizes and change
> size as execution proceeds, the number of activations per full page varies.
>
> (*) The reason why Peter's original scheme is slower is that in HPS at
> return time the VM has to check for a frame being hybrid and copy state
> to the hybrid context, making it stable.  This has to be done because a
> closure's outer temporary variables are stored in the lexically enclosing
> stack frame, and so have to be made to persist in the stable context or
> their values will be stale.  In my (lisp-derived) closure implementation
> there are no such references to outer temporary variables; values are
> either copied if they do not change, or stored in an Array (the
> "indirection vector"), which can then be copied.
>
> Note that divorce is a process.  Once divorced a context is single and
> free to function as a context does.  Indeed when the system starts up it
> loads an image which only contains single contexts.  All contexts are
> divorced on snapshot.  Divorced contexts are amnesiacs.
>
>
> So now, finally, we can answer your question.  Why is (Context>>#pc), a
> method that simply answers an instance variable, not compiled as a quick
> method with a primitive that answers that variable?  "married" and
> "widowed" contexts are proxies for stack frames.  Only the instance
> variables they contain which do not change can be fetched from the
> context object itself; these are the receiver, arguments, closureOrNil, and
> method. sender, pc, stackp and other stack contents may change as execution
> proceeds. These mutable values must be fetched from the stack frame itself,
> if it still exists.
>
> All accesses to sender, pc, stackp, and stack contents
> (Context>>at:[put:]) must be mediated.  A married or widowed context has
> its sender field set to the frame pointer of its stack frame, encoded as a
> SmallInteger.  You can never see this from the image (except via a special
> xRay primitive). By following the frame pointer encoded in the sender field
> the VM can detect if the frame is still live.  If so, the relevant
> values are synthesized or fetched from the stack frame itself.  If not, the
> context is widowed and "mourns"; it is changed into a context with a nil
> sender field, a nil pc, and a stackp that points at the last argument.
>
> How is this mediation done?  One design could be to require that all image
> level accesses to Context inst vars are through primitives.  This feels
> contrived to me.  Instead, inst var access is via inst var access
> bytcodes.  But if we do this naively we would have to check in all inst var
> access bytecodes whose index is the same as sender (0), pc (1) & stackp
> (2).  This would slow down inst var access enormously.  Instead we observe
> that inst var access bytecodes pushReceiverVariable:,
> popIntoReceiverVariable: have short forms for the most frequently used
> (indexes 0 to 15 for push, indexes 0 to 7 for popInto).  By arranging that
> we only use the long forms for accessing inst vars in Context, and
> superclasses and subclasses, we only have to check in the long forms and
> the check is cheap.  We check that the index is <= 2, because all other
> inst var accesses with these indices will use the short-form bytecodes.
>
> By arranging that Context's inst vars are *not* accessed by quick
> primitives we do not have to duplicate this machinery in quick primitive
> dispatch, which would also slow things down.  Hence Context>>pc is compiled
> as a vanilla method and its inst var access bytecode is the long form:
>
> 25 <E2 01> pushRcvr: 1
> 27 <5C> returnTop
>
> 'Nuff said
>
> Second, Context >> #isPrimFailToken: attracted my attention multiple
>> times when looking at different #timeProfile results of simulation
>> sessions. In the expression [100000 factorial] it takes up more than 44% of
>> the whole execution time!
>> I have no idea why it could be so slow, but this message is sent really
>> often, which is at least 2 times per simulation of a special message send.
>> Would there be an easy change to resolve this bottleneck and speed
>> up the simulation by 40% or more?
>> Would it be possible to provide a primitive for this method? Or do you
>> see any other way to optimize it?
>>
>
> It's not slow so much as that it has to be used all the time to check
> whether a send has invoked a primitive which has failed.  Its function is
> to support primitive error codes.  Before primitive error codes (IIRC) we
> simply checked for nil.  But with primitive error codes  we need a
> container that both reliably distinguishes a primitive failure result from
> all other objects in the system, and holds onto the primitive failure
> code.  It may indeed be able to reduce the overhead by only testing when
> absolutely necessary.  It may be that reimplementing the scheme is faster.
>
> For example, one could imagine a class whose instances are the only
> objects to answer #isPrimFailToken, which would be faster than a one
> argument message send.  But beware, because if one was debugging the
> debugger one mustn't confuse a PrimFailToken created in the simulation for
> a PrimFailToken created by the simulation.
> Remember to distinguish between price and value.  Implementing the
> context-to-stack mapping scheme outlined above was very costly, but its
> value is very high.  It allows us to have contexts, with all their utility,
> while reducing their costs significantly.  Similarly, and closely related,
> being able to simulate execution is costly but hugely valuable.  If the
> price is 40% of simulated execution then it is a price worth paying.
>
> Best,
>> Christoph
>>
>
> Happy New Year!
>
> _,,,^..^,,,_
> best, Eliot
>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210101/5e489bc8/attachment.html>


More information about the Squeak-dev mailing list