[Vm-dev] Re: primitive retry across suspension for OwnedLock waitAcquire (was: Interpreter versus StackInterpreter hierarchy)

Fri Jun 3 07:08:04 UTC 2016

On Fri, Jun 3, 2016 at 12:52 PM, Ben Coman <btc at openinworld.com> wrote:
> On Sun, May 22, 2016 at 2:15 AM, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>
>> On Fri, May 20, 2016 at 7:52 PM, Ben Coman <btc at openinworld.com> wrote:
>>>
>>> On Sat, May 21, 2016 at 4:36 AM, Clément Bera <bera.clement at gmail.com> wrote:
>>> >
>>> > On Fri, May 20, 2016 at 7:51 PM, Ben Coman <btc at openinworld.com> wrote:
>>> >>
>>> >> On Fri, May 20, 2016 at 9:25 PM, Clément Bera <bera.clement at gmail.com> wrote:
>>> >> > On Thu, May 19, 2016 at 3:42 PM, Ben Coman <btc at openinworld.com> wrote:
>>>
>> There is a better way of solving this, and that is to use a pragma to identify a method that contains such a suspension point, and have the process terminate code look for the pragma and act accordingly.  For example, the pragma could have a terminate action, sent to the receiver with the context as argument, e.g.
>>
>> Mutex>>critical: mutuallyExcludedBlock
>>     <onTerminate: #ensureMutexUnlockedInCritical:>
>>     ^lock waitAcquire
>>         ifNil: mutuallyExcludedBlock
>>         ifNotNil:[ mutuallyExcludedBlock ensure: [lock release] ]
>>
>> (and here I'm guessing...)
>>
>> Mutex>> ensureMutexUnlockedInCritical: aContext
>>     "long-winded comment explaining the corner case, referencing tests, etc, etc and how it is solved on terminate buy this method"
>>     (aContext pc = aContext initialPC
>>      and: [self inTheCorner]) ifTrue:
>>         [self doTheRightThingTM]
>>
>> So on terminate the stack is walked (it is anyway) looking for unwinds or onTerminate: markers.  Any onTerminate: markers are evaluated, and the corner case is solved.  The pragma approach also allows for visibility in the code.
>
> I think this general < onTerminate: > pragma might be useful, but I'd
> like to keep it in the back pocket for the moment while I explore
> another idea for primitive retry after a process resumes.
>
>
> I still have a concern that  #primitiveOwnedLockWaitAcquire sleeps at
> the bottom of the primitive, thus if the sleeping process is resumed,
> it continues into the critical section without having gained the
> mutex, which seems a bit fragile.  Retrying
> #primitiveOwnedLockWaitAcquire immediately after waking would
> effectively have the process sleep at the top of the primitive, and
> *not*proceed until it *really* holds the lock..
>
> So I'm thinking out loud here to formulate my thoughts, and in case
> there is some major impediment you can help me fail fast...
>
> One possibility is putting "self maybeRetryFailureAfterWaking"
> in #slowPrimitiveResponse, similar to maybeRetryFailureDueToForwarding
> and (guessing) maybeRetryFailureDueToLowMemory.  The difficulty seems
> to be that process's   primFailCode   doesn't hold across process
> suspension(??).
>
> As an aside, it seems fragile that IIUC it is possible for
> primFailCode   to be set by one process, which if then suspended will
> carry over to fail the new active process.   Perhaps somewhere like
> externalSetStackPageAndPointersForSuspendedContextOfProcess: should
> zero primFailCode.
>
> Anyway... I thought one way to retain   primFailCode   across process
> suspension might be to push it to the stack in  #transferTo:  and pop
> primFailCode from the stack in
> externalSetStackPageAndPointersForSuspendedContextOfProcess:
> except that I see that method called from a few places, so messing
> with the stack here is probably a bad idea.
>
> Another way might be for Process to get an additional instance
> variable 'suspendedPrimitiveFailCode' which again could be set in
> #transferTo:  (or even more specifically only in
> #primitiveOwnedLockWaitAcquire).  That is,
>
> #transferTo:  might have...
>
>   oldProc := objectMemory fetchPointer: ActiveProcessIndex ofObject: sched.
>   objectMemory
>       storePointer: SuspendedPrimitiveFailCodeIndex
>       ofObject: oldProc
>       withValue: primFailCode
>   ...
>     primFailCode := objectMemory
>         fetchPointer: SuspendedPrimitiveFailCodeIndex
>         ofObject: newProc.
>    self externalSetStackPageAndPointersForSuspendedContextOfProcess: newProc.
>
>

Whoops, for a start, that needed to be...
    objectMemory
        storeInteger: SuspendedPrimitiveFailCodeIndex
        ofObject: oldProc
        withValue: primFailCode.
    primFailCode := objectMemory
       fetchInteger: SuspendedPrimitiveFailCodeIndex
       ofObject: newProc.


> The additional advantage here might be that the saving and restoring
> of primFailCode is localised to one method. I'm hoping that change
> (plus similar in Cog) might be sufficient to facilitate behaviour like
> this...
>
> process 1
> 1.    invokes slowPrimitiveResponse
> 2.     dispatches to primitiveOwnedLockWaitAcquire
> 3.         calls primitiveFailFor: PrimErrRetryAfterWaking
> 4.         primitiveFail saved into Process object
> 5.         process goes to sleep
>
> 6. later after process 1 woken
> 7.   primitiveFail restored from Process object
> 8.   returns to slowPrimitiveResponse
> 9.   if PrimErrRetryAfterWaking
> 10.     dispatch to primitiveOwnedLockWaitAcquire (goto step 2)
>
> Maybe there would need to be a step 9a to checkForInterrupts to avoid
> too tight a loop locking the image, but maybe this is already done
> somewhere in the suspend/resume process.
>
> So I'm now going to try coding the second way in the StackVM, and then
> look at how it might be done in Cog.  All feedback appreciated.
>
> cheers -ben