[squeak-dev] Two new curious Context/primitive questions :-)
Frank Lesser
frank-lesser at lesser-software.com
Wed Jan 6 06:28:29 UTC 2021
Hi Christoph,
yes this optimization is possible -
IMO Pharo/Squeal has a CleanBlockClosure -
I used the term red & green BlockClosure ( demonstrated at a customer in
2010 )
- implemented in LSWGVM since more than a decade.
Frank
On 1/6/2021 01:16, Thiede, Christoph wrote:
>
> Hi Eliot,
>
>
> thanks for the feedback! This is an exciting task and I'll put it onto
> my list, I'm looking forward to tackling it ... :-)
>
>
> > We can, but remember that passing a block is in fact hiding an
> allocation, maybe two. Mentioning a block in code causes it to be
> allocated and if not already done so, causes allocation of the
> lexically enclosing context.
>
>
> <http://www.hpi.de/>
> That's correct, but I believe that it could be possible to waive any
> closure variables but only use block arguments, block returns, and
> method returns. This should be relatively fast, shouldn't it?
>
> Best,
> Christoph
> ------------------------------------------------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org> im
> Auftrag von Eliot Miranda <eliot.miranda at gmail.com>
> *Gesendet:* Samstag, 2. Januar 2021 02:45:53
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] Two new curious Context/primitive
> questions :-)
> Hi Christoph,
>
> On Fri, Jan 1, 2021 at 4:18 PM Thiede, Christoph
> <Christoph.Thiede at student.hpi.uni-potsdam.de
> <mailto:Christoph.Thiede at student.hpi.uni-potsdam.de>> wrote:
>
> Hi Eliot,
>
>
> once again I am very pleasantly surprised by your professional and
> detailed answer which was a joy to read!
>
> <http://www.hpi.de/>
>
> > 'Nuff said
>
> No further questions. This was very interesting!
> It also gives me a possible explanation for why one cannot
> execute a context subinstance without any problems. That's because
> the marriage between stack frame and context objects is hard-coded
> based on the Context class (specialObjects at: 11), isn't it?
>
>
> Indeed. One may be able to go the other way, create a stack of
> contexts whose class is different. These should retain their class
> because the associated stack frames will be married to them. But
> going the other way is not supported.
>
> > But beware, because if one was debugging the debugger one mustn't
> confuse a PrimFailToken created in the simulation for a
> PrimFailToken created by the simulation.
>
> Hm, aren't we already in a very similar situation? I had planned
> to write a separate message for this issue, but here is my case:
>
> {Context primitiveFailTokenFor: nil} at: 1. "{an Object . nil}"
> Context runSimulated: [{Context primitiveFailTokenFor: nil} at:
> 1]. "⚡ Error: subscript is out of bounds: 1"
>
> Now one could argue that this is not a very likely situation to
> occur ever in "production", but the same could be said for a
> hypothetical #isPrimFailToken selector, and above all, it
> precludes an ideal, i.e. completely transparent simulation
> machinery ...
> So I would like to suggest to find a different solution for this
> problem anyway, provided that you support this idea in general.
>
> An alternative I could imagine would be the SetElement pattern
> from Collections which is completely transparent. But this would
> probably be too expensive for simulation purposes (because we
> would need to instantiate one object per
> simulated primitive call), right?
>
>
> Not sure. If instantiation can be deferred until failure then things
> should be OK.
>
> Why not encode a failing primitive via a changed control
> flow? Couldn't we pass an additional failBlock argument to
> #doPrimitive:method:receiver:args:, #tryPrimitive:withArgs:/#tryNamedPrimitiveIn:for:withArgs:,
> and #simulateValueWithArguments:caller: (and analogously to
> ExternalFunction >> #tryInvokeWithArguments: and the critsect
> primitive methods on Mutex)?
>
>
> We can, but remember that passing a block is in fact hiding an
> allocation, maybe two. Mentioning a block in code causes it to be
> allocated and if not already done so, causes allocation of the
> lexically enclosing context.
>
> Then we could rewrite #send:to:with:lookupIn: like this (pseudo):
>
> send: selector to: rcvr with: arguments lookupIn: lookupClass
> "Simulate the action of sending a message with selector
> and arguments to rcvr. The argument, lookupClass, is the class
> in which to lookup the message. This is the receiver's class
> for normal messages, but for super messages it will be some
> specific class related to the source method."
>
> | meth primFailCodeval ctxt |
> (meth := lookupClass lookupSelector: selector) ifNil:
> [selector == #doesNotUnderstand: ifTrue:
> [self error: 'Recursive message not understood!'
> translated].
> ^self send: #doesNotUnderstand:
> to: rcvr
> with: {(Message selector: selector arguments:
> arguments) lookupClass: lookupClass}
> lookupIn: lookupClass].
>
> meth isCompiledMethod ifFalse:
> ["Object as Methods (OaM) protocol: 'The contract is
> that, when the VM encounters an ordinary object (rather than a
> compiled method) in the method dictionary during lookup, it
> sends it the special selector #run:with:in: providing the
> original selector, arguments, and receiver.'. DOI:
> 10.1145/2991041.2991062."
> ^self send: #run:with:in:
> to: meth
> with: {selector. arguments. rcvr}].
>
> meth numArgs = arguments size ifFalse:
> [^ self error: ('Wrong number of arguments in
> simulated message {1}' translated format: {selector})].
> (primIndex := meth primitive) > 0 ifTrue:
> - [val := self doPrimitive: primIndex method: meth
> receiver: rcvr args: arguments.
> - (self isPrimFailToken: val) ifFalse:
> - [^val]].
> + [val := self doPrimitive: primIndex method: meth
> receiver: rcvr args: arguments ifFail:
> + [:code | primFailCode := code].
> + primFailCode ifNil: [^ val]].
>
> (selector == #doesNotUnderstand: and: [lookupClass ==
> ProtoObject]) ifTrue:
> [^self error: ('Simulated message {1} not understood'
> translated format: {arguments first selector})].
>
> ctxt := Context sender: self receiver: rcvr method: meth
> arguments: arguments.
> - (primIndex isInteger and: [primIndex > 0]) ifTrue:
> - [ctxt failPrimitiveWith: val].
> + primFailCode ifNotNil:
> + [ctxt failPrimitiveWith: primFailCode].
>
> ^ctxt
>
>
> Current senders of #primitiveFailTokenFor: could then evaluate the
> failBlock if the primitive fails.
> Do you think this approach would be worth a try? :-)
>
>
> Definitely. It might eliminate multiple tests. I think you should
> definitely try it. It looks nice to me. Thank you!
>
>
> Best,
> Christoph
> ------------------------------------------------------------------------
> *Von:* Squeak-dev <squeak-dev-bounces at lists.squeakfoundation.org
> <mailto:squeak-dev-bounces at lists.squeakfoundation.org>> im Auftrag
> von Eliot Miranda <eliot.miranda at gmail.com
> <mailto:eliot.miranda at gmail.com>>
> *Gesendet:* Freitag, 1. Januar 2021 22:34:28
> *An:* The general-purpose Squeak developers list
> *Betreff:* Re: [squeak-dev] Two new curious Context/primitive
> questions :-)
> errata:
>
> On Fri, Jan 1, 2021 at 1:22 PM Eliot Miranda
> <eliot.miranda at gmail.com <mailto:eliot.miranda at gmail.com>> wrote:
> Hi Christoph,
>
> happy new year!!
>
> On Fri, Jan 1, 2021 at 11:28 AM Thiede, Christoph
> <Christoph.Thiede at student.hpi.uni-potsdam.de
> <mailto:Christoph.Thiede at student.hpi.uni-potsdam.de>> wrote:
>
> Hi all, hi Eliot,
>
> new year, new simulation fun, and I have collected two new
> questions about the Context implementation which I'd love to
> get answered here:
>
> First, I was confused by the following:
> (BlockClosure >> #numArgs) primitive. "266"
> (Context >> #pc) primitive. "0"
> What makes Context so special that it cannot be compiled with
> quick accessor methods?
>
>
> The gory details are here:
> http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/
> <http://www.mirandabanda.org/cogblog/2009/01/14/under-cover-contexts-and-the-big-frame-up/>
>
> But that's full of implementation detail. Let me try and explain
> at a higher level.
>
> Contexts are wonderful but very expensive. Any VM that actually
> creates contexts for every method or block activation will
> inevitably run slowly. There are two overheads here. One is the
> basic overhead of allocating and reclaiming the contexts that are
> created. The other is the cost of moving the outgoing receiver and
> arguments from the sending context to the new context. A fast VM
> must avoid these overheads, and the key solution is to use a stack
> frame organization as is used in conventional language
> implementations. The key here is that with stack frames the
> receiver and incoming arguments do not have to be moved; they stay
> where they are and a new frame is built underneath them which
> includes them as the incoming receiver and arguments; stack frames
> overlap outgoing and incoming arguments.
>
> But crucially, in addition the VM must still create contexts if
> and when they are needed. If not, this would no longer be a
> Smalltalk-80 implementation. Now one could imagine a VM which
> converted stack frames to contexts whenever a method or block
> activation was accessed using thisContext (or thisContext sender
> etc which access contexts by following the sender field). But that
> would be very slow. Instead, the fastest approach is to try and
> keep stack frames as stack frames for as long as possible. For
> this to work contexts have to be able to function as "proxy
> objects" for stack frames. The VM does indeed create contexts
> when one uses thisContext, or even when one creates a block, but
> internally the context is in a very special state, a state in
> which it points at a stack frame.
>
> Some terminology is useful. Peter Deutsch used "volatile",
> "hybrid" and "stable". I like "single", "married", "widowed" and
> "divorced" (Peter's scheme didn't have "widowed" which is the key
> to why my scheme is faster, and why the closure implementation is
> as it is).
>
> When a stack frame is created it is "single". It has a slot
> within it to refer to a context object if it is needed, but the
> slot is nil.
> When a stack frame needs a context object one is created and the
> two are "married".
> When a method or block returns its stack frame is discarded; the
> stack frame has died and if it was married its context becomes
> "widowed". (*)
> When space is needed for new stack frames all reclaimed stack
> frames are "divorced" from their contexts. The stack frame state
> is written to the context object and the context object no longer
> acts as a proxy; it is a fully fledged normal stack object. (*)
>
> (*) Stack frames exist in a small memory zone organized as stack
> pages, the stack zone. The size of the stack zone is set at
> startup, and its size is either the VM's default or set by a value
> that persists in the image header (see vmParameterAt: 43). Since
> the system may run out of stack pages (a deep call stack, many
> active processes), the VM makes new pages available by purging the
> least recently used pages to the heap in the form of context
> objects. All the frames in the least recently used stack page are
> divorced at the same time. A context must be created for each
> frame that is not yet married.A page is big enough to hold on
> average about 40 method activations; since activations are of
> different sizes and change size as execution proceeds, the number
> of activations per full page varies.
>
> (*) The reason why Peter's original scheme is slower is that in
> HPS at return time the VM has to check for a frame being
> hybrid and copy state to the hybrid context, making it stable.
> This has to be done because a closure's outer temporary variables
> are stored in the lexically enclosing stack frame, and so have to
> be made to persist in the stable context or their values will be
> stale. In my (lisp-derived) closure implementation there are no
> such references to outer temporary variables; values are either
> copied if they do not change, or stored in an Array (the
> "indirection vector"), which can then be copied.
>
> Note that divorce is a process. Once divorced a context is single
> and free to function as a context does. Indeed when the system
> starts up it loads an image which only contains single contexts.
> All contexts are divorced on snapshot. Divorced contexts are
> amnesiacs.
>
>
> So now, finally, we can answer your question. Why is
> (Context>>#pc), a method that simply answers an instance variable,
> not compiled as a quick method with a primitive that answers that
> variable? "married" and "widowed" contexts are proxies for stack
> frames. Only the instance variables they contain which do not
> change can be fetched from the context object itself; these are
> the receiver, arguments, closureOrNil, and method. sender, pc,
> stackp and other stack contents may change as execution proceeds.
> These mutable values must be fetched from the stack frame itself,
> if it still exists.
>
> All accesses to sender, pc, stackp, and stack contents
> (Context>>at:[put:]) must be mediated. A married or widowed
> context has its sender field set to the frame pointer of its stack
> frame, encoded as a SmallInteger. You can never see this from the
> image (except via a special xRay primitive). By following the
> frame pointer encoded in the sender field the VM can detect if the
> frame is still live. If so, the relevant values are synthesized
> or fetched from the stack frame itself. If not, the context is
> widowed and "mourns"; it is changed into a context with a nil
> sender field, a nil pc, and a stackp that points at the last argument.
>
> How is this mediation done? One design could be to require that
> all image level accesses to Context inst vars are through
> primitives. This feels contrived to me. Instead, inst var access
> is via inst var access bytcodes. But if we do this naively we
> would have to check in all inst var access bytecodes whose index
> is the same as sender (0), pc (1) & stackp (2). This would slow
> down inst var access enormously. Instead we observe that inst var
> access bytecodes pushReceiverVariable:, popIntoReceiverVariable:
> have short forms for the most frequently used (indexes 0 to 15 for
> push, indexes 0 to 7 for popInto). By arranging that we only use
> the long forms for accessing inst vars in Context, and
> superclasses and subclasses, we only have to check in the long
> forms and the check is cheap. We check that the index is <= 2,
> because all other inst var accesses with these indices will use
> the short-form bytecodes.
>
> By arranging that Context's inst vars are *not* accessed by quick
> primitives we do not have to duplicate this machinery in quick
> primitive dispatch, which would also slow things down. Hence
> Context>>pc is compiled as a vanilla method and its inst var
> access bytecode is the long form:
>
> 25 <E2 01> pushRcvr: 1
> 27 <5C> returnTop
>
> 'Nuff said
>
> Second, Context >> #isPrimFailToken: attracted my attention
> multiple times when looking at different #timeProfile results
> of simulation sessions. In the expression [100000 factorial]
> it takes up more than 44% of the whole execution time!
> I have no idea why it could be so slow, but this message is
> sent really often, which is at least 2 times per simulation of
> a special message send.
> Would there be an easy change to resolve this bottleneck and
> speed up the simulation by 40% or more?
> Would it be possible to provide a primitive for this method?
> Or do you see any other way to optimize it?
>
>
> It's not slow so much as that it has to be used all the time to
> check whether a send has invoked a primitive which has failed.
> Its function is to support primitive error codes. Before primitive
> error codes (IIRC) we simply checked for nil. But with primitive
> error codes we need a container that both reliably distinguishes a
> primitive failure result from all other objects in the system, and
> holds onto the primitive failure code. It may indeed be able to
> reduce the overhead by only testing when absolutely necessary. It
> may be that reimplementing the scheme is faster.
>
> For example, one could imagine a class whose instances are the
> only objects to answer #isPrimFailToken, which would be faster
> than a one argument message send. But beware, because if one was
> debugging the debugger one mustn't confuse a PrimFailToken created
> in the simulation for a PrimFailToken created by the simulation.
> Remember to distinguish between price and value. Implementing the
> context-to-stack mapping scheme outlined above was very costly,
> but its value is very high. It allows us to have contexts, with
> all their utility, while reducing their costs significantly.
> Similarly, and closely related, being able to simulate execution
> is costly but hugely valuable. If the price is 40% of simulated
> execution then it is a price worth paying.
>
> Best,
> Christoph
>
>
> Happy New Year!
>
> _,,,^..^,,,_
> best, Eliot
>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20210106/0cf1b11f/attachment.html>
More information about the Squeak-dev
mailing list
|