[Vm-dev] context change versus primitive failure return value (was: FFI exception failure support on Win64 (Win32 also?))

Eliot Miranda eliot.miranda at gmail.com
Mon Aug 27 20:21:05 UTC 2018


Hi Ben,

On Mon, Aug 27, 2018 at 11:36 AM Ben Coman <btc at openinworld.com> wrote:

>
> I've changed the subject since I'm not sure if this is related,
> but the description dredged up a memory of a concern I had a while ago
> regarding the interaction of context-switching and
> primitive-failure-return-values.
>
> On Sun, 26 Aug 2018 at 05:23, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
>>
>> Hi Windows Experts,
>>
>>     I'm trying to have the FFI exception failure support work across the
>> board.  What I see now is that the 64-bit StackVM works, while the 64-bit
>> CogVM does not, even though they have exactly the same exception handling
>> machinery in place (the new squeakExceptionHandler machinery; thank you
>> whoever wrote it, it looks very nice), and the same essential architecture
>> for jumping back into the interpreter.
>>
>> I expect the issue is that the machinery for maintaining a chain through
>> the stack and/or the stack search used in exception delivery is broken my
>> careless management of the C stack in the Cog VM, whereas in the StackVM
>> the C stack remains undisturbed.
>>
>
> Back when I was having a go at new mutex primitives,
> when a process "A" failed to lock a mutex, I wanted to return
> a primitive-failed-code in addition to the usual context-change to
> different process "B"
> However what I observed was that because of the context change
> the primitive-failed-code incorrectly ended up returned on the stack of
> process "B".
>

Your description doesn't match how (I understand) the VM works.  The only
way that Process A can initiate a process switch, mutex lock, et al, is by
sending a message to some object (a process, mutex or semaphore).  So we're
talking about Process>>suspend & resume as well as the lock/unlock and
wait/signal primitives.  Primitive failure *always* delivers a primitive
error to the method that contains the primitive and, in this case,
initiated the process switch.  Primitives validate their arguments and then
succeed, or fail, leaving their arguments undisturbed (there is one
regrettable exception to this in the BttF/Cog VM which is the segment
loading primitive that leaves its input word array scrambled if a failure
occurs, rather than incur the cost of cloning the array).

So the only way that a primitive could fail and the error code end up on
the wrong process's stack would be if the primitive was mis-designed to not
validate before occurring.  Essentially it can not fail and cause a side
effect.  Primitives should be side-effect free when they fail and hence if
a process switch primitive fails, it cannot yet have caused a process
switch and therefore the error code would have to be delivered to process
A's stack.


>
> I'm stretching my memory so there is a reasonable change this is
> misleading...
> but I believe I observed this happening in
> CoInterpreter>>internalExecuteNewMethod
> near this code..
>
>     "slowPrimitiveResponse may of course context-switch. ..."
>      succeeded := self slowPrimitiveResponse.
>      ...
>      succeeded ifTrue: [....
>

But internalExecuteNewMethod doesn't contain the switch code,
internalActivateNewMethod does, and it does the process switch *after*
delivering the primitive failure code, see reapAndResetErrorCodeTo:header:
in the following:

internalActivateNewMethod
...
(self methodHeaderHasPrimitive: methodHeader) ifTrue:
["Skip the CallPrimitive bytecode, if it's there, and store the error code
if the method starts
 with a long store temp.  Strictly no need to skip the store because it's
effectively a noop."
localIP := localIP + (self sizeOfCallPrimitiveBytecode: methodHeader).
primFailCode ~= 0 ifTrue:
[self reapAndResetErrorCodeTo: localSP header: methodHeader]].

self assert: (self frameNumArgs: localFP) == argumentCount.
self assert: (self frameIsBlockActivation: localFP) not.
self assert: (self frameHasContext: localFP) not.

"Now check for stack overflow or an event (interrupt, must scavenge, etc)."
localSP < stackLimit ifTrue:
[self externalizeIPandSP.
switched := self handleStackOverflowOrEventAllowContextSwitch:
(self canContextSwitchIfActivating: newMethod header: methodHeader).
self returnToExecutive: true postContextSwitch: switched.
self internalizeIPandSP]


> Though I can't exactly put my finger on explaining why, my intuition is
> that
> changing threads "half way" through a bytecode is a bad thing.
>

Indeed it is, and the VM does not do this.  It is possible that the
execution simulation machinery in Context, InstructionStream at al could
have been written carelessly to allow this to occur, but it cannot and does
not occur in the VM proper.


> I started (but lost my way) to develop an idea to propose...
> that rather than any context-changing primitive (e.g. #primtiveWait)
> directly
> calling CoInterpreter>>transferTo:from:,   it would just flag for
> #transferTo:from:
> to be called at the next interpreter cycle, before the next bytecode is
> started.
> Thus #internalExecuteNewMethod gets to exit normally, placing any
> primitive-failure-code
> onto the correct process stack before the context-change.
>

Given the design constraint that primitives either fail without side
effects or complete atomically this architectural change isn't necessary.


> Interestingly the comment in #transferTo:from says...
>      "Record a process to be awoken on the next interpreter cycle."
> which sounds like what I'd propose, but actually it doesn't wait for
> the next interpreter cycle and instead immediately changes context.
>

No it doesn't.  It effects the process change, but control continues with
the caller, allowing the caller to do other things before the process
resumes.  For example, if in checkForEventsMayContextSwitch: more than one
semaphore is signaled (it could initiate any combination of  signals of the
low space semaphore, the input semaphore, external semaphores (associated
with file descriptors & sockets), and the delay semaphore) then only the
highest priority process would be runnable after the sequence, and several
transferTo:[from:]'s could have been initiated from these signals,
depending on process priority.  But  checkForEventsMayContextSwitch: will
not finish mid-sequence.  It will always complete all of its signals before
returning to code that can then resume the newly activated process.


> Philosophically it seems cleaner for context changes to happen
> "between" bytecodes rather than "in the middle" of them,
> but I'm unclear on the practical implications.
>

That's right and that's what happens.  Context switches only occur at
suspension points and these are only between byte codes.  In fact, process
switches can only occur on sends (and a subset of sends that aren't
implemented as primitive, or are amongst the primitives in which process
switch is allowed, namely the process,mutex,semaphore,and eval
(perform,valeOfMethod:,BlockClosure>>value*) primitives) and backward jumps
at the end of loops (this last to prevent infinite loops preventing
response to interrupts).  checkForEventsMayContextSwitch: is invoked only
at these points.


> Also, probably a year later for curiousity I was browsing the MT code
> and it seemed to do something like what I'd propose,
> however I can't remember that reference.
>

Again this is about scheduling a thread switch when a process is bound to
some other thread.  This is written as a two level scheduler where the
current process completes its event check in
checkForEventsMayContextSwitch: before a thread switch occurring prior to
resuming the new process.  The MT code may be incorrect in that it's still
a work in progress.  So I intend it to work as I describe, and if so I'd
expect to see a thread switch in
CoInterpreter>>MT>>returnToExecutive:postContextSwitch:, but I see no such
method.  In which case I have work to do ;-)

hope that made some sense,
>

Yes, and I hope equally that I've allayed your fears.


> cheers -ben
>

_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180827/21952439/attachment-0001.html>


More information about the Vm-dev mailing list