[Vm-dev] primitive retry across suspension for OwnedLock waitAcquire (was: Interpreter versus StackInterpreter hierarchy)

Fri Jun 3 04:52:09 UTC 2016

On Sun, May 22, 2016 at 2:15 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
> On Fri, May 20, 2016 at 7:52 PM, Ben Coman <btc at openinworld.com> wrote:
>>
>> On Sat, May 21, 2016 at 4:36 AM, Clément Bera <bera.clement at gmail.com> wrote:
>> >
>> > On Fri, May 20, 2016 at 7:51 PM, Ben Coman <btc at openinworld.com> wrote:
>> >>
>> >> On Fri, May 20, 2016 at 9:25 PM, Clément Bera <bera.clement at gmail.com> wrote:
>> >> > On Thu, May 19, 2016 at 3:42 PM, Ben Coman <btc at openinworld.com> wrote:
>>
> There is a better way of solving this, and that is to use a pragma to identify a method that contains such a suspension point, and have the process terminate code look for the pragma and act accordingly.  For example, the pragma could have a terminate action, sent to the receiver with the context as argument, e.g.
>
> Mutex>>critical: mutuallyExcludedBlock
>     <onTerminate: #ensureMutexUnlockedInCritical:>
>     ^lock waitAcquire
>         ifNil: mutuallyExcludedBlock
>         ifNotNil:[ mutuallyExcludedBlock ensure: [lock release] ]
>
> (and here I'm guessing...)
>
> Mutex>> ensureMutexUnlockedInCritical: aContext
>     "long-winded comment explaining the corner case, referencing tests, etc, etc and how it is solved on terminate buy this method"
>     (aContext pc = aContext initialPC
>      and: [self inTheCorner]) ifTrue:
>         [self doTheRightThingTM]
>
> So on terminate the stack is walked (it is anyway) looking for unwinds or onTerminate: markers.  Any onTerminate: markers are evaluated, and the corner case is solved.  The pragma approach also allows for visibility in the code.

I think this general < onTerminate: > pragma might be useful, but I'd
like to keep it in the back pocket for the moment while I explore
another idea for primitive retry after a process resumes.

I still have a concern that  #primitiveOwnedLockWaitAcquire sleeps at
the bottom of the primitive, thus if the sleeping process is resumed,
it continues into the critical section without having gained the
mutex, which seems a bit fragile.  Retrying
#primitiveOwnedLockWaitAcquire immediately after waking would
effectively have the process sleep at the top of the primitive, and
*not*proceed until it *really* holds the lock..

So I'm thinking out loud here to formulate my thoughts, and in case
there is some major impediment you can help me fail fast...

One possibility is putting "self maybeRetryFailureAfterWaking"
in #slowPrimitiveResponse, similar to maybeRetryFailureDueToForwarding
and (guessing) maybeRetryFailureDueToLowMemory.  The difficulty seems
to be that process's   primFailCode   doesn't hold across process
suspension(??).

As an aside, it seems fragile that IIUC it is possible for
primFailCode   to be set by one process, which if then suspended will
carry over to fail the new active process.   Perhaps somewhere like
externalSetStackPageAndPointersForSuspendedContextOfProcess: should
zero primFailCode.

Anyway... I thought one way to retain   primFailCode   across process
suspension might be to push it to the stack in  #transferTo:  and pop
primFailCode from the stack in
externalSetStackPageAndPointersForSuspendedContextOfProcess:
except that I see that method called from a few places, so messing
with the stack here is probably a bad idea.

Another way might be for Process to get an additional instance
variable 'suspendedPrimitiveFailCode' which again could be set in
#transferTo:  (or even more specifically only in
#primitiveOwnedLockWaitAcquire).  That is,

#transferTo:  might have...

  oldProc := objectMemory fetchPointer: ActiveProcessIndex ofObject: sched.
  objectMemory
      storePointer: SuspendedPrimitiveFailCodeIndex
      ofObject: oldProc
      withValue: primFailCode
  ...
    primFailCode := objectMemory
        fetchPointer: SuspendedPrimitiveFailCodeIndex
        ofObject: newProc.
   self externalSetStackPageAndPointersForSuspendedContextOfProcess: newProc.

The additional advantage here might be that the saving and restoring
of primFailCode is localised to one method. I'm hoping that change
(plus similar in Cog) might be sufficient to facilitate behaviour like
this...

process 1
1.    invokes slowPrimitiveResponse
2.     dispatches to primitiveOwnedLockWaitAcquire
3.         calls primitiveFailFor: PrimErrRetryAfterWaking
4.         primitiveFail saved into Process object
5.         process goes to sleep

6. later after process 1 woken
7.   primitiveFail restored from Process object
8.   returns to slowPrimitiveResponse
9.   if PrimErrRetryAfterWaking
10.     dispatch to primitiveOwnedLockWaitAcquire (goto step 2)

Maybe there would need to be a step 9a to checkForInterrupts to avoid
too tight a loop locking the image, but maybe this is already done
somewhere in the suspend/resume process.

So I'm now going to try coding the second way in the StackVM, and then
look at how it might be done in Cog.  All feedback appreciated.

cheers -ben