[squeak-dev] Two new curious Context/primitive questions :-)

Wed Jan 6 06:28:29 UTC 2021

Hi Christoph,

yes this optimization is possible -

IMO Pharo/Squeal has a CleanBlockClosure -

I used the term red & green BlockClosure ( demonstrated at a customer in 
2010  )

- implemented in LSWGVM since more than a decade.

Frank
On 1/6/2021 01:16, Thiede, Christoph wrote:
>
> Hi Eliot,
>
>
> thanks for the feedback! This is an exciting task and I'll put it onto 
> my list, I'm looking forward to tackling it ... :-)
>
>
> > We can, but remember that passing a block is in fact hiding an 
> allocation, maybe two. Mentioning a block in code causes it to be 
> allocated and if not already done so, causes allocation of the 
> lexically enclosing context.
>
>
> <http://www.hpi.de/>
> That's correct, but I believe that it could be possible to waive any 
> closure variables but only use block arguments, block returns, and 
> method returns. This should be relatively fast, shouldn't it?
>
> Best,
> Christoph
> ------------------------------------------------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im 
> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> *Gesendet:* Samstag, 2. Januar 2021 02:45:53
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] Two new curious Context/primitive 
> questions :-)
> Hi Christoph,
>
> On Fri, Jan 1, 2021 at 4:18 PM Thiede, Christoph 
> <Christoph.Thiede at student.hpi.uni-potsdam.de 
> <mailto:Christoph.Thiede at student.hpi.uni-potsdam.de>> wrote:
>
>     Hi Eliot,
>
>
>     once again I am very pleasantly surprised by your professional and
>     detailed answer which was a joy to read!
>
>     <http://www.hpi.de/>
>
>     > 'Nuff said
>
>     No further questions. This was very interesting!
>     It also gives me a possible explanation for why one cannot
>     execute a context subinstance without any problems. That's because
>     the marriage between stack frame and context objects is hard-coded
>     based on the Context class (specialObjects at: 11), isn't it?
>
>
> Indeed. One may be able to go the other way, create a stack of 
> contexts whose class is different.  These should retain their class 
> because the associated stack frames will be married to them.  But 
> going the other way is not supported.
>
>     > But beware, because if one was debugging the debugger one mustn't
>     confuse a PrimFailToken created in the simulation for a
>     PrimFailToken created by the simulation.
>
>     Hm, aren't we already in a very similar situation? I had planned
>     to write a separate message for this issue, but here is my case:
>
>     {Context primitiveFailTokenFor: nil} at: 1. "{an Object . nil}"
>     Context runSimulated: [{Context primitiveFailTokenFor: nil} at:
>     1]. "⚡ Error: subscript is out of bounds: 1"
>
>     Now one could argue that this is not a very likely situation to
>     occur ever in "production", but the same could be said for a
>     hypothetical #isPrimFailToken selector, and above all, it
>     precludes an ideal, i.e. completely transparent simulation
>     machinery ...
>     So I would like to suggest to find a different solution for this
>     problem anyway, provided that you support this idea in general.
>
>     An alternative I could imagine would be the SetElement pattern
>     from Collections which is completely transparent. But this would
>     probably be too expensive for simulation purposes (because we
>     would need to instantiate one object per
>     simulated primitive call), right?
>
>
> Not sure.  If instantiation can be deferred until failure then things 
> should be OK.
>
>     Why not encode a failing primitive via a changed control
>     flow? Couldn't we pass an additional failBlock argument to
>     #doPrimitive:method:receiver:args:, #tryPrimitive:withArgs:/#tryNamedPrimitiveIn:for:withArgs:,
>     and #simulateValueWithArguments:caller: (and analogously to
>     ExternalFunction >> #tryInvokeWithArguments: and the critsect
>     primitive methods on Mutex)?
>
>
> We can, but remember that passing a block is in fact hiding an 
> allocation, maybe two.  Mentioning a block in code causes it to be 
> allocated and if not already done so, causes allocation of the 
> lexically enclosing context.
>
>     Then we could rewrite #send:to:with:lookupIn: like this (pseudo):
>
>         send: selector to: rcvr with: arguments lookupIn: lookupClass
>             "Simulate the action of sending a message with selector
>         and arguments to rcvr. The argument, lookupClass, is the class
>         in which to lookup the message. This is the receiver's class
>         for normal messages, but for super messages it will be some
>         specific class related to the source method."
>
>             | meth primFailCodeval ctxt |
>             (meth := lookupClass lookupSelector: selector) ifNil:
>                 [selector == #doesNotUnderstand: ifTrue:
>                     [self error: 'Recursive message not understood!'
>         translated].
>                 ^self send: #doesNotUnderstand:
>                         to: rcvr
>                         with: {(Message selector: selector arguments:
>         arguments) lookupClass: lookupClass}
>                         lookupIn: lookupClass].
>
>             meth isCompiledMethod ifFalse:
>                 ["Object as Methods (OaM) protocol: 'The contract is
>         that, when the VM encounters an ordinary object (rather than a
>         compiled method) in the method dictionary during lookup, it
>         sends it the special selector #run:with:in: providing the
>         original selector, arguments, and receiver.'. DOI:
>         10.1145/2991041.2991062."
>                 ^self send: #run:with:in:
>                     to: meth
>                     with: {selector. arguments. rcvr}].
>
>             meth numArgs = arguments size ifFalse:
>                 [^ self error: ('Wrong number of arguments in
>         simulated message {1}' translated format: {selector})].
>             (primIndex := meth primitive) > 0 ifTrue:
>         -         [val := self doPrimitive: primIndex method: meth
>         receiver: rcvr args: arguments.
>         -         (self isPrimFailToken: val) ifFalse:
>         -             [^val]].
>         +         [val := self doPrimitive: primIndex method: meth
>         receiver: rcvr args: arguments ifFail:
>         +             [:code | primFailCode := code].
>         +  primFailCode ifNil: [^ val]].
>
>             (selector == #doesNotUnderstand: and: [lookupClass ==
>         ProtoObject]) ifTrue:
>                 [^self error: ('Simulated message {1} not understood'
>         translated format: {arguments first selector})].
>
>             ctxt := Context sender: self receiver: rcvr method: meth
>         arguments: arguments.
>         -     (primIndex isInteger and: [primIndex > 0]) ifTrue:
>         -         [ctxt failPrimitiveWith: val].
>         +  primFailCode ifNotNil:
>         +       [ctxt failPrimitiveWith: primFailCode].
>
>             ^ctxt
>
>
>     Current senders of #primitiveFailTokenFor: could then evaluate the
>     failBlock if the primitive fails.
>     Do you think this approach would be worth a try? :-)
>
>
> Definitely.  It might eliminate multiple tests.  I think you should 
> definitely try it.  It looks nice to me.  Thank you!
>
>
>     Best,
>     Christoph
>     ------------------------------------------------------------------------
>     *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org
>     <mailto:squeak-dev-bounces at lists.squeakfoundation.org>> im Auftrag
>     von Eliot Miranda <eliot.miranda at gmail.com
>     <mailto:eliot.miranda at gmail.com>>
>     *Gesendet:* Freitag, 1. Januar 2021 22:34:28
>     *An:* The general-purpose Squeak developers list
>     *Betreff:* Re: [squeak-dev] Two new curious Context/primitive
>     questions :-)
>     errata:
>
>     On Fri, Jan 1, 2021 at 1:22 PM Eliot Miranda
>     <eliot.miranda at gmail.com <mailto:eliot.miranda at gmail.com>> wrote:
>     Hi Christoph,
>
>         happy new year!!
>
>     On Fri, Jan 1, 2021 at 11:28 AM Thiede, Christoph
>     <Christoph.Thiede at student.hpi.uni-potsdam.de
>     <mailto:Christoph.Thiede at student.hpi.uni-potsdam.de>> wrote:
>
>         Hi all, hi Eliot,
>
>         new year, new simulation fun, and I have collected two new
>         questions about the Context implementation which I'd love to
>         get answered here:
>
>         First, I was confused by the following:
>         (BlockClosure >> #numArgs) primitive. "266"
>         (Context >> #pc) primitive. "0"
>         What makes Context so special that it cannot be compiled with
>         quick accessor methods?
>
>
>     The gory details are here:
>     http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/
>     <http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>
>
>     But that's full of implementation detail.  Let me try and explain
>     at a higher level.
>
>     Contexts are wonderful but very expensive.  Any VM that actually
>     creates contexts for every method or block activation will
>     inevitably run slowly.  There are two overheads here. One is the
>     basic overhead of allocating and reclaiming the contexts that are
>     created. The other is the cost of moving the outgoing receiver and
>     arguments from the sending context to the new context.  A fast VM
>     must avoid these overheads, and the key solution is to use a stack
>     frame organization as is used in conventional language
>     implementations.  The key here is that with stack frames the
>     receiver and incoming arguments do not have to be moved; they stay
>     where they are and a new frame is built underneath them which
>     includes them as the incoming receiver and arguments; stack frames
>     overlap outgoing and incoming arguments.
>
>     But crucially, in addition the VM must still create contexts if
>     and when they are needed.  If not, this would no longer be a
>     Smalltalk-80 implementation.  Now one could imagine a VM which
>     converted stack frames to contexts whenever a method or block
>     activation was accessed using thisContext (or thisContext sender
>     etc which access contexts by following the sender field). But that
>     would be very slow.  Instead, the fastest approach is to try and
>     keep stack frames as stack frames for as long as possible.  For
>     this to work contexts have to be able to function as "proxy
>     objects" for stack frames.  The VM does indeed create contexts
>     when one uses thisContext, or even when one creates a block, but
>     internally the context is in a very special state, a state in
>     which it points at a stack frame.
>
>     Some terminology is useful.  Peter Deutsch used "volatile",
>     "hybrid" and "stable".  I like "single", "married", "widowed" and
>     "divorced" (Peter's scheme didn't have "widowed" which is the key
>     to why my scheme is faster, and why the closure implementation is
>     as it is).
>
>     When a stack frame is created it is "single".  It has a slot
>     within it to refer to a context object if it is needed, but the
>     slot is nil.
>     When a stack frame needs a context object one is created and the
>     two are "married".
>     When a method or block returns its stack frame is discarded; the
>     stack frame has died and if it was married its context becomes
>     "widowed". (*)
>     When space is needed for new stack frames all reclaimed stack
>     frames are "divorced" from their contexts.  The stack frame state
>     is written to the context object and the context object no longer
>     acts as a proxy; it is a fully fledged normal stack object. (*)
>
>     (*) Stack frames exist in a small memory zone organized as stack
>     pages, the stack zone. The size of the stack zone is set at
>     startup, and its size is either the VM's default or set by a value
>     that persists in the image header (see vmParameterAt: 43).  Since
>     the system may run out of stack pages (a deep call stack, many
>     active processes), the VM makes new pages available by purging the
>     least recently used pages to the heap in the form of context
>     objects.  All the frames in the least recently used stack page are
>     divorced at the same time. A context must be created for each
>     frame that is not yet married.A page is big enough to hold on
>     average about 40 method activations; since activations are of
>     different sizes and change size as execution proceeds, the number
>     of activations per full page varies.
>
>     (*) The reason why Peter's original scheme is slower is that in
>     HPS at return time the VM has to check for a frame being
>     hybrid and copy state to the hybrid context, making it stable. 
>     This has to be done because a closure's outer temporary variables
>     are stored in the lexically enclosing stack frame, and so have to
>     be made to persist in the stable context or their values will be
>     stale.  In my (lisp-derived) closure implementation there are no
>     such references to outer temporary variables; values are either
>     copied if they do not change, or stored in an Array (the
>     "indirection vector"), which can then be copied.
>
>     Note that divorce is a process.  Once divorced a context is single
>     and free to function as a context does.  Indeed when the system
>     starts up it loads an image which only contains single contexts. 
>     All contexts are divorced on snapshot. Divorced contexts are
>     amnesiacs.
>
>
>     So now, finally, we can answer your question.  Why is
>     (Context>>#pc), a method that simply answers an instance variable,
>     not compiled as a quick method with a primitive that answers that
>     variable?  "married" and "widowed" contexts are proxies for stack
>     frames.  Only the instance variables they contain which do not
>     change can be fetched from the context object itself; these are
>     the receiver, arguments, closureOrNil, and method. sender, pc,
>     stackp and other stack contents may change as execution proceeds.
>     These mutable values must be fetched from the stack frame itself,
>     if it still exists.
>
>     All accesses to sender, pc, stackp, and stack contents
>     (Context>>at:[put:]) must be mediated.  A married or widowed
>     context has its sender field set to the frame pointer of its stack
>     frame, encoded as a SmallInteger. You can never see this from the
>     image (except via a special xRay primitive). By following the
>     frame pointer encoded in the sender field the VM can detect if the
>     frame is still live.  If so, the relevant values are synthesized
>     or fetched from the stack frame itself.  If not, the context is
>     widowed and "mourns"; it is changed into a context with a nil
>     sender field, a nil pc, and a stackp that points at the last argument.
>
>     How is this mediation done?  One design could be to require that
>     all image level accesses to Context inst vars are through
>     primitives. This feels contrived to me.  Instead, inst var access
>     is via inst var access bytcodes. But if we do this naively we
>     would have to check in all inst var access bytecodes whose index
>     is the same as sender (0), pc (1) & stackp (2).  This would slow
>     down inst var access enormously.  Instead we observe that inst var
>     access bytecodes pushReceiverVariable:, popIntoReceiverVariable:
>     have short forms for the most frequently used (indexes 0 to 15 for
>     push, indexes 0 to 7 for popInto). By arranging that we only use
>     the long forms for accessing inst vars in Context, and
>     superclasses and subclasses, we only have to check in the long
>     forms and the check is cheap.  We check that the index is <= 2,
>     because all other inst var accesses with these indices will use
>     the short-form bytecodes.
>
>     By arranging that Context's inst vars are *not* accessed by quick
>     primitives we do not have to duplicate this machinery in quick
>     primitive dispatch, which would also slow things down.  Hence
>     Context>>pc is compiled as a vanilla method and its inst var
>     access bytecode is the long form:
>
>     25 <E2 01> pushRcvr: 1
>     27 <5C> returnTop
>
>     'Nuff said
>
>         Second, Context >> #isPrimFailToken: attracted my attention
>         multiple times when looking at different #timeProfile results
>         of simulation sessions. In the expression [100000 factorial]
>         it takes up more than 44% of the whole execution time!
>         I have no idea why it could be so slow, but this message is
>         sent really often, which is at least 2 times per simulation of
>         a special message send.
>         Would there be an easy change to resolve this bottleneck and
>         speed up the simulation by 40% or more?
>         Would it be possible to provide a primitive for this method?
>         Or do you see any other way to optimize it?
>
>
>     It's not slow so much as that it has to be used all the time to
>     check whether a send has invoked a primitive which has failed. 
>     Its function is to support primitive error codes. Before primitive
>     error codes (IIRC) we simply checked for nil. But with primitive
>     error codes we need a container that both reliably distinguishes a
>     primitive failure result from all other objects in the system, and
>     holds onto the primitive failure code.  It may indeed be able to
>     reduce the overhead by only testing when absolutely necessary.  It
>     may be that reimplementing the scheme is faster.
>
>     For example, one could imagine a class whose instances are the
>     only objects to answer #isPrimFailToken, which would be faster
>     than a one argument message send.  But beware, because if one was
>     debugging the debugger one mustn't confuse a PrimFailToken created
>     in the simulation for a PrimFailToken created by the simulation.
>     Remember to distinguish between price and value. Implementing the
>     context-to-stack mapping scheme outlined above was very costly,
>     but its value is very high.  It allows us to have contexts, with
>     all their utility, while reducing their costs significantly.
>     Similarly, and closely related, being able to simulate execution
>     is costly but hugely valuable.  If the price is 40% of simulated
>     execution then it is a price worth paying.
>
>         Best,
>         Christoph
>
>
>     Happy New Year!
>
>     _,,,^..^,,,_
>     best, Eliot
>
>
>
> -- 
> _,,,^..^,,,_
> best, Eliot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210106/0cf1b11f/attachment.html>