[Vm-dev] context change versus primitive failure return value (was: FFI exception failure support on Win64 (Win32 also?))

Wed Aug 29 03:08:24 UTC 2018

Hi Ben, (and hi all prospective VM hackers)
On Tue, Aug 28, 2018 at 12:22 PM Ben Coman <btc at openinworld.com> wrote:

>
> Hi Eliot, Thanks for the detailed response.
>
> On Tue, 28 Aug 2018 at 04:21, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> On Mon, Aug 27, 2018 at 11:36 AM Ben Coman <btc at openinworld.com> wrote:
>>
>>> Back when I was having a go at new mutex primitives,
>>> when a process "A" failed to lock a mutex, I wanted to return
>>> a primitive-failed-code in addition to the usual context-change to
>>> different process "B"
>>> However what I observed was that because of the context change
>>> the primitive-failed-code incorrectly ended up returned on the stack of
>>> process "B".
>>>
>>
>> Your description doesn't match how (I understand) the VM works.  The only
>> way that Process A can initiate a process switch, mutex lock, et al, is by
>> sending a message to some object (a process, mutex or semaphore).  So we're
>> talking about Process>>suspend & resume as well as the lock/unlock and
>> wait/signal primitives.  Primitive failure *always* delivers a primitive
>> error to the method that contains the primitive and, in this case,
>> initiated the process switch.  Primitives validate their arguments and then
>> succeed, or fail, leaving their arguments undisturbed (there is one
>> regrettable exception to this in the BttF/Cog VM which is the segment
>> loading primitive that leaves its input word array scrambled if a failure
>> occurs, rather than incur the cost of cloning the array).
>>
>
>> So the only way that a primitive could fail and the error code end up on
>> the wrong process's stack would be if the primitive was mis-designed to not
>> validate before occurring.
>>
>
>
> Essentially it can not fail and cause a side effect.
>>
>
> This was my first foray into writing a primitive (still on my backlog to
> be completed).
> I was aware of validating the arguments and leaving them undisturbed for a
> failure,
> but wasn't paying attention to primitive failure being completely free
> from side effects.
>
> Primitives should be side-effect free when they fail and hence if a
>> process switch primitive fails, it cannot yet have caused a process switch
>> and therefore the error code would have to be delivered to process A's
>> stack.
>>
>
> That is kind of the outcome of what I was proposing.  I was trying for a
> mutex locking primitive that could fail without causing a side-effect "in
> the image".   Maybe its not valid to distinguish between side-effects
> "inside" or "outside" the image, but I thought it might be reasonable for a
> flag hidden "in the VM" to be just another event checked by
> #checkForEventsMayContextSwitch: .  Effectively the "in image" effect
> happens outside the primitive, a bit like reaching nextWakeupUsecs, or like
> I imagine a callback might work.
>
> A side thought was that if context-changes occurred in a *single* location
> in #checkForEventsMayContextSwitch,
> it might be easier to make an "Idle VM"
>
>
>>
>>
>>>
>>> I'm stretching my memory so there is a reasonable change this is
>>> misleading...
>>> but I believe I observed this happening in
>>> CoInterpreter>>internalExecuteNewMethod
>>> near this code..
>>>
>>>     "slowPrimitiveResponse may of course context-switch. ..."
>>>      succeeded := self slowPrimitiveResponse.
>>>      ...
>>>      succeeded ifTrue: [....
>>>
>>
>> But internalExecuteNewMethod doesn't contain the switch code,
>> internalActivateNewMethod does, and it does the process switch *after*
>> delivering the primitive failure code, see reapAndResetErrorCodeTo:header:
>> in the following:
>>
>> internalActivateNewMethod
>> ...
>> (self methodHeaderHasPrimitive: methodHeader) ifTrue:
>> ["Skip the CallPrimitive bytecode, if it's there, and store the error
>> code if the method starts
>>  with a long store temp.  Strictly no need to skip the store because it's
>> effectively a noop."
>> localIP := localIP + (self sizeOfCallPrimitiveBytecode: methodHeader).
>> primFailCode ~= 0 ifTrue:
>> [self reapAndResetErrorCodeTo: localSP header: methodHeader]].
>>
>> self assert: (self frameNumArgs: localFP) == argumentCount.
>> self assert: (self frameIsBlockActivation: localFP) not.
>> self assert: (self frameHasContext: localFP) not.
>>
>> "Now check for stack overflow or an event (interrupt, must scavenge,
>> etc)."
>> localSP < stackLimit ifTrue:
>> [self externalizeIPandSP.
>> switched := self handleStackOverflowOrEventAllowContextSwitch:
>> (self canContextSwitchIfActivating: newMethod header: methodHeader).
>> self returnToExecutive: true postContextSwitch: switched.
>> self internalizeIPandSP]
>>
>>
>>> Though I can't exactly put my finger on explaining why, my intuition is
>>> that
>>> changing threads "half way" through a bytecode is a bad thing.
>>>
>>
>> Indeed it is, and the VM does not do this.  It is possible that the
>> execution simulation machinery in Context, InstructionStream at al could
>> have been written carelessly to allow this to occur, but it cannot and does
>> not occur in the VM proper.
>>
>
>  I made a chart to understand this better. One thing first, I'm not sure
> I've correctly linked execution of the primitives into
> slowPrimitiveResponse.  I'm not at all clear about
> how internalExecuteNewMethod selects between internalQuickPrimitiveResponse
> and slowPrimitiveResponse, and what is the difference between them?
>

Well, the first thing to say is that this is a magnificent diagram; thank
you.  The problem is that the VM, and hence the diagram, is much more
complex than the blue book specification, essentially because the VM is a
highly optimized interpreter, whereas the specification is bare bones.  So
I would ask you, and anyone else who wants to understand a Smalltalk-80 VM
(a VM that provides Context objects for method activations, rather than a
Smalltalk that uses a more conventional stack model) to read the Blue Book
Specification: http://www.mirandabanda.org/bluebook/bluebook_chapter28.html
<http://www.mirandabanda.org/bluebook/bluebook_chapter28.html> carefully
and fully. This is the last section of Smalltalk-80: The Language and its
Implementation, by Adele Goldberg and David Robson.  The specification is
well-written and clear and once digested serves as essential reference for
understanding a more complex production VM.

Now to your question: "I'm not at all clear about
how internalExecuteNewMethod selects
between internalQuickPrimitiveResponse and slowPrimitiveResponse, and what
is the difference between them?".

First, in a system that notionally allocates a context object to hold every
activation, leaf routines are extremely expensive if all they do is answer
a constant or an instance variable.  Dan Ingall's optimization is to avoid
activations by providing a set of quick primitives that answer an instance
variable whose slot index is from 0 to 255, or self, nil, true, false, -1,
0, 1 & 2.  The self, nil, true, false, -1, 0, 1 & 2 constants are derived
from a static frequency analysis of literals and variable references in
Smalltalk code and are echoed in the original bytecode set, byte
codes 112-119 bering 01110iii Push (receiver, true, false, nil, -1, 0, 1,
2) [iii].  internalQuickPrimitiveResponse handles precisely these
primitives, and these primitives only.  These primitives have the property
that they can never fail, so they are invoked along a path that does not
reset the flag used to identify primitive failure, nor test it.

All other primitives are handled by slowPrimitiveResponse.  This requires
clearing primErrorCode and calling a function implementing the primitive
and then testing primErrorCode before either continuing or building an
activation for the failing primitive. There is another important task of
internalExecuteNewMethod, which is to store the "internal" frame and stack
pointers (localFP & localSP) into the global interpreter frame and stack
pointers (framePointer and stackPointer) before invoking and then restoring
localFP & localSP from framePointer and stackPointer after having invoked
slowPrimitiveResponse.  In the Back-to-the-Future (BTTF) VMs (the original
Squeak VM and the Stack and Cog VMs) primitives access their receiver and
arguments through framePointer and stackPointer.  But these, being global,
are slow and without compiler-specific hacks cannot be placed in
registers.  The Slang translator and the interpreter code collaborate to
inline much of the interpreter, including every method beginning with
internal, into the interpret routine in which localFP localSP and localIP
are declared, hence allowing a C compiler to assign these variables to
registers.  So another reason slowPrimitiveResponse is slow is that it
writes and reads localFP, localSP & localIP to/from framePointer,
stackPointer & instructionPointer.  But because it does so, primitives that
change the execution context (process switch primitives that switch to
another "stack" of contexts, or eval primitives such as perform:with:* and
value:value* which build another frame) can be written and change the
execution context at a send point (primitive invocation is always at a
send).

Note that all the internal methods have non-internal duals that do the same
thing but use framePointer, stackPointer & instructionPointer, not localFP,
localSP & localIP.  These are used to implement the eval primitives
perform:* and withArgs:executeMethod: since these may also invoke
primitives.  And hence you might be able to get your head around my
favorite Smalltalk construction:
| array |
array := { #perform:withArguments:. nil }.
array at: 2 put: array.
array perform: array first withArguments: array
;-)

Given the primitives are always invoked at a send we can see how elegant
Dan's invention of primitive failure is.  Primitives are essentially
transactional and atomic.  They validate their arguments and if validation
succeeds they carry out their action and answer a result as if from some
normal send.  But if validation fails, or if they are unimplemented, the
method containing the primitive reference (<primitive: 61>, <primitive:
'primitiveSocketAccept' module: 'SocketPlugin'>) is simply activated as if
the primitive didn't exist, or as if the method was a normal method.  Hence
primitives are optional, in that if the method body does what the primitive
does then no one can tell if the primitive is doing the work or Smalltalk
code, except by measuring performance.  Hence for example large integer
arithmetic and string display primitives are optional and serve to
accelerate the system.

There is one other route by which primitives are executed, also at the
point of send, and this is via the special selector bytecocdes.  Another of
Dan's excellent optimizations, the special selectors both save space and
make the interpreter faster.  They save space by encoding the 32 most
frequently occurring sends as one byte bytecodes, hence saving the 2
(16-bit Smalltalk-80), 4 (32-bit Squeak) or 8 (64-bit Squeak) bytes to
store the selector in a method's literal frame.  But some of them also
speed up the interpreter by statically predicting the receiver type.  i.e.
#+, #-, $/ #*, #<, #>, #<= et al are most often sent to integers, and hence
these bytecodes, as specified in the Blue Book, check for the top two
elements on the stack being SmallIntegers, and if so replace the two top
elements by the result of the operation, avoiding a send and primitive
dispatch.  Note that in the BttF interpreter this checking is extended both
to apply to Float, and to check for a fool,lowing branch after the
conditionals #<, #<= et al, so that the result doesn't have to be reified
into a boolean that is tested immediately; effectively the following branch
gets folded into the relational special selector bytecode.  The JIT uses
this same technique, but is able to do a much better job because, for
example, it can know if a relational special selector send is followed by a
jump bytecode or not an JIT time.

> [image: ContextChange-Existing.png]
>
> So I understand that checkForEventsMayContextSwitch: called at the end
> of internalActivateNewMethod
> occurs after bytecode execution has completed, so that context switches
> made there are done "between" bytecodes.
> However my perspective is that internalExecuteNewMethod is only half way
> through a bytecode execution when
> the primitives effect context changes.  So internalActivateNewMethod ends
> up working on a different Process than internalExecuteNewMethod started
> with.  The bottom three red lines in the chart are what I considered to be
> changing threads "half way" through a bytecode.
>

More accurately, checkForEventsMayContextSwitch: is called on activating a
method, after the send has occurred, but before the first bytecode has
executed.  Hence primitive sends are not suspension points unless the
primitive is a process switch or eval primitive.  Instead, just as a
non-primitive (or failing primitive) method is being activated
checkForEventsMayContextSwitch: is invoked.

Ahh, I'm slowly coming to grips with this.  It was extremely confusing at
> the time why my failure code from the primitive was turning up in a
> different Process, though I then learnt a lot digging to discover why.   In
> summary, if the primitive succeeds it simply returns
> from internalExecuteNewMethod and internalActivateNewMethod never sees the
> new Process B.  My problem violating the primitive-failure side-effect rule
> was that internalActivateNewMethod trying to run Process A
> in-Image-primitive-failure code
> instead ran Process B in-Image-primitive-failure code.
>

Right.  So that design requirement is key to the VM architecture.  That was
something I had to explain to Alistair when he did the first cut of the
FileAttributesPlugin, which used to return failure codes on error instead
of failing, leaving error recovery to clients.  And so it underscores the
importance of reading the blue book specification.  [We really should do an
up-to-date version that describes a simplified 32-bit implementation].

The design requirement that primitives validate their arguments and
complete atomically or fail without side-effects is also key to Spur.  Spur
speeds up become by using transparent forwarders (any object can become a
forwarder) but reduces the cost of transparent forwarders by arranging the
forwarders only have to be checked for during a send or during primitive
argument validation.  The receiver has to be accessed during a send anyway,
and so Spur is able to move the check for a forwarder to the lookup side of
a send, only checking for forwarders if the method cache probe failed,
which will always be the case for forwarders.  Likewise, primitive argument
validation always fails for forwarders, and hence in the Spur
slowPrimitiveResponse is a check on failure for a primitive's "accessor
depth", the depth of the graph of arguments it validates is.  If a
primitive has a non-negative accessor depth then on failure the arguments
are traversed to that depth, and any forwarders encountered are followed,
fixing up that part of the object graph, and the primitive retried.
Without Dan's primitive design, Spur could not work well.

Interestingly the comment in #transferTo:from says...
>>>      "Record a process to be awoken on the next interpreter cycle."
>>> which sounds like what I'd propose, but actually it doesn't wait for
>>> the next interpreter cycle and instead immediately changes context.
>>>
>>
>> No it doesn't.  It effects the process change, but control continues with
>> the caller, allowing the caller to do other things before the process
>> resumes.
>>
>
> In my case I believe "control continues with the caller" was not true
> since internalActivateNewMethod
> was trying to run the in-Image-primitive-failure code after the process
> changed.
> But that was because I violated the side-effect rule.
>
> For example, if in checkForEventsMayContextSwitch: more than one semaphore
>> is signaled (it could initiate any combination of  signals of the low space
>> semaphore, the input semaphore, external semaphores (associated with file
>> descriptors & sockets), and the delay semaphore) then only the highest
>> priority process would be runnable after the sequence, and several
>> transferTo:[from:]'s could have been initiated from these signals,
>> depending on process priority.  But  checkForEventsMayContextSwitch: will
>> not finish mid-sequence.  It will always complete all of its signals before
>> returning to code that can then resume the newly activated process.
>>
>
> Just to summarise to check I understood this correctly, no bytecode is
> executed during checkForEventsMayContextSwitch:.
> That is, its multiple transferTo: calls don't re-enter the interpreter?
> It just which Process is set to run changes,
> until at the end of checkForEventsMayContextSwitch: it returns to the
> interpreter to pick up the next bytecode of the active process.
>

That's right.  transferTo: merely sets framePointer, stackPointer &
instructionPointer to those of the highest priority runnable process.
Execution resumes at that point once checkForEventsMayContextSwitch returns
to its caller.  For details to do with mixing JITted code and interpreted
code checkForEventsMayContextSwitch: answers a variable indicating if a
switch actually took place, and returnToExecutive:postContextSwitch: may or
may not have to longjmp back into the interpreter or jump into machine
code.  But that's just an optimization.  Execution could always resume in
the interpreter; the VM would simply be a little slower, and arguably a lot
less complicated.

So the moral is, both the BttF and Cog VMs are complex, because they are
optimized, Cog adding an entirely new level of complexity over the BttF
interpreter VM.  If you want to see the wood for the trees first read the
Blue Book spec http://www.mirandabanda.org/bluebook/bluebook_chapter28.html.

> cheers -ben.
>
>>
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180828/df1da6f9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ContextChange-Existing.png
Type: image/png
Size: 268468 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180828/df1da6f9/attachment-0001.png>