[Vm-dev] context change versus primitive failure return value (was: FFI exception failure support on Win64 (Win32 also?))

Tue Aug 28 19:21:54 UTC 2018

Hi Eliot, Thanks for the detailed response.

On Tue, 28 Aug 2018 at 04:21, Eliot Miranda <eliot.miranda at gmail.com> wrote:

>
> On Mon, Aug 27, 2018 at 11:36 AM Ben Coman <btc at openinworld.com> wrote:
>
>> Back when I was having a go at new mutex primitives,
>> when a process "A" failed to lock a mutex, I wanted to return
>> a primitive-failed-code in addition to the usual context-change to
>> different process "B"
>> However what I observed was that because of the context change
>> the primitive-failed-code incorrectly ended up returned on the stack of
>> process "B".
>>
>
> Your description doesn't match how (I understand) the VM works.  The only
> way that Process A can initiate a process switch, mutex lock, et al, is by
> sending a message to some object (a process, mutex or semaphore).  So we're
> talking about Process>>suspend & resume as well as the lock/unlock and
> wait/signal primitives.  Primitive failure *always* delivers a primitive
> error to the method that contains the primitive and, in this case,
> initiated the process switch.  Primitives validate their arguments and then
> succeed, or fail, leaving their arguments undisturbed (there is one
> regrettable exception to this in the BttF/Cog VM which is the segment
> loading primitive that leaves its input word array scrambled if a failure
> occurs, rather than incur the cost of cloning the array).
>

> So the only way that a primitive could fail and the error code end up on
> the wrong process's stack would be if the primitive was mis-designed to not
> validate before occurring.
>

Essentially it can not fail and cause a side effect.
>

This was my first foray into writing a primitive (still on my backlog to be
completed).
I was aware of validating the arguments and leaving them undisturbed for a
failure,
but wasn't paying attention to primitive failure being completely free from
side effects.

Primitives should be side-effect free when they fail and hence if a process
> switch primitive fails, it cannot yet have caused a process switch and
> therefore the error code would have to be delivered to process A's stack.
>

That is kind of the outcome of what I was proposing.  I was trying for a
mutex locking primitive that could fail without causing a side-effect "in
the image".   Maybe its not valid to distinguish between side-effects
"inside" or "outside" the image, but I thought it might be reasonable for a
flag hidden "in the VM" to be just another event checked by
#checkForEventsMayContextSwitch: .  Effectively the "in image" effect
happens outside the primitive, a bit like reaching nextWakeupUsecs, or like
I imagine a callback might work.

A side thought was that if context-changes occurred in a *single* location
in #checkForEventsMayContextSwitch,
it might be easier to make an "Idle VM"

>
>
>>
>> I'm stretching my memory so there is a reasonable change this is
>> misleading...
>> but I believe I observed this happening in
>> CoInterpreter>>internalExecuteNewMethod
>> near this code..
>>
>>     "slowPrimitiveResponse may of course context-switch. ..."
>>      succeeded := self slowPrimitiveResponse.
>>      ...
>>      succeeded ifTrue: [....
>>
>
> But internalExecuteNewMethod doesn't contain the switch code,
> internalActivateNewMethod does, and it does the process switch *after*
> delivering the primitive failure code, see reapAndResetErrorCodeTo:header:
> in the following:
>
> internalActivateNewMethod
> ...
> (self methodHeaderHasPrimitive: methodHeader) ifTrue:
> ["Skip the CallPrimitive bytecode, if it's there, and store the error code
> if the method starts
>  with a long store temp.  Strictly no need to skip the store because it's
> effectively a noop."
> localIP := localIP + (self sizeOfCallPrimitiveBytecode: methodHeader).
> primFailCode ~= 0 ifTrue:
> [self reapAndResetErrorCodeTo: localSP header: methodHeader]].
>
> self assert: (self frameNumArgs: localFP) == argumentCount.
> self assert: (self frameIsBlockActivation: localFP) not.
> self assert: (self frameHasContext: localFP) not.
>
> "Now check for stack overflow or an event (interrupt, must scavenge, etc)."
> localSP < stackLimit ifTrue:
> [self externalizeIPandSP.
> switched := self handleStackOverflowOrEventAllowContextSwitch:
> (self canContextSwitchIfActivating: newMethod header: methodHeader).
> self returnToExecutive: true postContextSwitch: switched.
> self internalizeIPandSP]
>
>
>> Though I can't exactly put my finger on explaining why, my intuition is
>> that
>> changing threads "half way" through a bytecode is a bad thing.
>>
>
> Indeed it is, and the VM does not do this.  It is possible that the
> execution simulation machinery in Context, InstructionStream at al could
> have been written carelessly to allow this to occur, but it cannot and does
> not occur in the VM proper.
>

 I made a chart to understand this better. One thing first, I'm not sure
I've correctly linked execution of the primitives into
slowPrimitiveResponse.  I'm not at all clear about
how internalExecuteNewMethod selects between internalQuickPrimitiveResponse
and slowPrimitiveResponse, and what is the difference between them?

[image: ContextChange-Existing.png]

So I understand that checkForEventsMayContextSwitch: called at the end
of internalActivateNewMethod
occurs after bytecode execution has completed, so that context switches
made there are done "between" bytecodes.
However my perspective is that internalExecuteNewMethod is only half way
through a bytecode execution when
the primitives effect context changes.  So internalActivateNewMethod ends
up working on a different Process than internalExecuteNewMethod started
with.  The bottom three red lines in the chart are what I considered to be
changing threads "half way" through a bytecode.

Ahh, I'm slowly coming to grips with this.  It was extremely confusing at
the time why my failure code from the primitive was turning up in a
different Process, though I then learnt a lot digging to discover why.   In
summary, if the primitive succeeds it simply returns
from internalExecuteNewMethod and internalActivateNewMethod never sees the
new Process B.  My problem violating the primitive-failure side-effect rule
was that internalActivateNewMethod trying to run Process A
in-Image-primitive-failure code
instead ran Process B in-Image-primitive-failure code.

>
>> Interestingly the comment in #transferTo:from says...
>>      "Record a process to be awoken on the next interpreter cycle."
>> which sounds like what I'd propose, but actually it doesn't wait for
>> the next interpreter cycle and instead immediately changes context.
>>
>
> No it doesn't.  It effects the process change, but control continues with
> the caller, allowing the caller to do other things before the process
> resumes.
>

In my case I believe "control continues with the caller" was not true since
internalActivateNewMethod
was trying to run the in-Image-primitive-failure code after the process
changed.
But that was because I violated the side-effect rule.

For example, if in checkForEventsMayContextSwitch: more than one semaphore
> is signaled (it could initiate any combination of  signals of the low space
> semaphore, the input semaphore, external semaphores (associated with file
> descriptors & sockets), and the delay semaphore) then only the highest
> priority process would be runnable after the sequence, and several
> transferTo:[from:]'s could have been initiated from these signals,
> depending on process priority.  But  checkForEventsMayContextSwitch: will
> not finish mid-sequence.  It will always complete all of its signals before
> returning to code that can then resume the newly activated process.
>

Just to summarise to check I understood this correctly, no bytecode is
executed during  checkForEventsMayContextSwitch:.
That is, its multiple transferTo: calls don't re-enter the interpreter?  It
just which Process is set to run changes,
until at the end of checkForEventsMayContextSwitch it returns to the
interpreter to pick up the next bytecode of the active process.

cheers -ben.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180829/7d8ea4dd/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ContextChange-Existing.png
Type: image/png
Size: 268468 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180829/7d8ea4dd/attachment-0001.png>