[Vm-dev] [squeak-dev] Solving termination of critical sections in the context of priority inversion was: SemaphoreTest fails in trunk, is a fix needed for the 5.2 release?

Juan Vuletich JuanVuletich at zoho.com
Sat Jul 28 14:43:18 UTC 2018


Hi Eliot,

The V3 code in Cuis was last updated on 2010, but that was just the 
comment. I went back in time, and this (except for the comment) 
identical version of the code is 10/5/2007 by Andreas.

critical: mutuallyExcludedBlock
     "Evaluate mutuallyExcludedBlock only if the receiver is not 
currently in
     the process of running the critical: message. If the receiver is, 
evaluate
     mutuallyExcludedBlock after the other critical: message is finished."
     | blockValue caught |
     caught := false.
     [
         caught := true.
         self wait.
         blockValue := mutuallyExcludedBlock value
     ] ensure: [caught ifTrue: [self signal]].
     ^blockValue

If you are interested, in Cuis, unzip /PreviousUpdates and in file list 
set this filter: *-ar.cs.st;*-ar.?.cs.st;*-ar.??.cs.st
This will show all the change sets in Cuis history directly based on 
Andreas' contributions. In this case it is 0104-SemaCritical-ar.cs.st

You know this stuff better than anyone, and I'm happy to follow your 
lead here, Eliot.

Just a question. You are using primitives #primitiveExitCriticalSection  
#primitiveEnterCriticalSection 
#primitiveTestAndSetOwnershipOfCriticalSection (185, 186, 187), that 
Cuis hasn't used so far. Are they supported by SqueakJS and the classic 
interpreter?

Thanks,

-- 
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich


On 7/27/2018 2:40 AM, Eliot Miranda wrote:
>
> Ha!
>
> On Thu, Jul 26, 2018 at 9:31 PM, Eliot Miranda 
> <eliot.miranda at gmail.com <mailto:eliot.miranda at gmail.com>> wrote:
>
>     Hi David, Hi Bert, Clément, Juan, Levente and Marcus, Hi Anyone
>     else with strong experience in the VM with processes,
>
>     On Mon, Jul 23, 2018 at 7:38 PM, David T. Lewis
>     <lewis at mail.msen.com <mailto:lewis at mail.msen.com>> wrote:
>
>         Semaphore seems like a rather basic thing that should work
>         correctly in
>         any Squeak image. The tests do not pass in trunk any more.
>
>         Specifically, SemaphoreTest>>testSemaInCriticalWait fails in
>         trunk, but passes
>         in the earlier Squeak 4.6 / 5.0 images.
>
>         Is this a real problem? Does it need to be fixed for the 5.2
>         release?
>
>
>     Yes.  Yes.  And it needs to be fixed in Pharo too.  I know this
>     message will strike you as TL;DR, but please, especially if you're
>     Bert, Clément, Juan, Levente or Marcus, read this carefully.  It's
>     quite important.  And below I'll present the Squeak code but will
>     work with Clément and Marcus to implement semantically equivalent
>     Pharo code asap.
>
>     And apologies in advance for the repetitious nature of this
>     message.  It is better that I am precise than I am brief and
>     anyone miss anything.  This is an old problem and it will be nice
>     if I've fixed it, but I could easily have missed something; this
>     problem having been around for decades.  OK...
>
>
>     This is an old problem which boiled down to there being no way to
>     determine by looking at a process's suspendedContext whether a
>     process is either waiting on a Semaphore or Mutex or is no longer
>     waiting, but has made no progress because it is at a lower
>     priority than runnable processes and so has not got a chance to
>     run yet.
>
>     So in
>         | s |
>         s := Semaphore new.
>         ...
>         s wait
>         ...
>
>     if we look at the context containing the wait its pc will be the
>     same whether the process is blocked, waiting on the semaphore, or
>     whether the semaphore has been signalled but the process has not
>     been able to proceed because it is of lower priority than runnable
>     processes and so can make no progress.  This caused problems for
>     code such as this:
>
>     Semaphore>>critical: mutuallyExcludedBlock
>     self wait.
>     ^mutuallyExcludedBlock ensure: [self signal]
>
>     because the ensure: won't be entered if higher priority runnable
>     processes are preventing it from running.
>
>     And for code such as this:
>
>     Semaphore>>critical: mutuallyExcludedBlock
>     ^[self wait.
>     mutuallyExcludedBlock value]
>     ensure: [self signal]
>
>     because if the process is terminated when the semaphore has not
>     been signalled (i.e. the process is blocked in the wait),
>     Process>>terminate will run the ensure: block anyway, resulting in
>     the Semaphore getting an extra signal.
>
>     This occupied Andreas and I at Qwaq, and we didn't solve it.  We
>     developed Mutex as a more efficient version of Monitor, but this
>     is also subject so the same problem.  We did change the definition
>     of ensure: so that it is not a suspension point, by adding
>     valueNoContextSwitch[:]
>
>     BlockClosure>>ensure: aBlock
>     "Evaluate a termination block after evaluating the receiver,
>     regardless of
>     whether the receiver's evaluation completes.  N.B.  This method is
>     *not*
>     implemented as a primitive.  Primitive 198 always fails.  The VM
>     uses prim
>     198 in a context's method as the mark for an ensure:/ifCurtailed:
>     activation."
>
>     | complete returnValue |
>     <primitive: 198>
>     returnValue := self valueNoContextSwitch.
>     complete ifNil:[
>     complete := true.
>     aBlock value.
>     ].
>     ^ returnValue
>
>     This means that we don't have to deal with suspensions here
>     (marked with !!!)
>
>     I now understand how to distinguish between the two cases, between
>     blocking and not blocked but no progress.  Process>>suspend
>     answers the list the Process was on when it was suspended.  If the
>     process is already suspended Process>>suspend answers nil.  If the
>     process is waiting on a Semaphore or a Mutex, Process>>suspend
>     answers the Semaphore or Mutex. And if the process is runnable
>     then Process>>suspend answers the process's run list (a LinkedList
>     in ProcessorScheduler's quiescentProcessLists array corresponding
>     to the process's priority).
>
>     So Process>>#terminate can distinguish between #wait or
>     #primitiveEnterCriticalSection or
>     #primitiveTestAndSetOwnershipOfCriticalSection being blocked, or
>     being unblocked but having made no progress due to too low a
>     priority.  We do so by testing the class of the result of
>     suspending the process.  If it is a LinkedList, the process has
>     past the #wait or #primitiveEnterCriticalSection but has made no
>     progress due to too low a priority.
>
>     The version of Process>>#terminate I'm about to commit deals with
>     several cases.  Let me present the cases first.  There are three
>     versions of Semaphore>>#critical: to handle, and one version of
>     Mutex>>critical: and Mutex>>#critical:ifLocked:.
>
>     The two basic versions of Semaphore>>critical: are
>
>     V1
>     critical: mutuallyExcludedBlock
>     "Evaluate mutuallyExcludedBlock only if the receiver is not
>     currently in
>     the process of running the critical: message. If the receiver is,
>     evaluate
>     mutuallyExcludedBlock after the other critical: message is finished."
>     <criticalSection>
>     self wait.
>     ^mutuallyExcludedBlock ensure: [self signal]
>
>     V2
>     critical: mutuallyExcludedBlock
>     "Evaluate mutuallyExcludedBlock only if the receiver is not
>     currently in
>     the process of running the critical: message. If the receiver is,
>     evaluate
>     mutuallyExcludedBlock after the other critical: message is finished."
>     <criticalSection>
>     ^[self wait.
>       mutuallyExcludedBlock value]
>     ensure: [self signal]
>
>     and Juan's safer version is (after I added the criticalSection pragma)
>
>     V3
>     critical: mutuallyExcludedBlock
>     "Evaluate mutuallyExcludedBlock only if the receiver is not
>     currently in
>     the process of running the critical: message. If the receiver is,
>     evaluate
>     mutuallyExcludedBlock after the other critical: message is finished."
>     <criticalSection>
>     | caught |
>     "We need to catch eventual interruptions very carefully.
>     The naive approach of just doing, e.g.,:
>     self wait.
>     aBlock ensure:[self signal].
>     will fail if the active process gets terminated while in the wait.
>     However, the equally naive:
>     [self wait.
>     aBlock value] ensure:[self signal].
>     will fail too, since the active process may get interrupted while
>     entering the ensured block and leave the semaphore signaled twice.
>     To avoid both problems we make use of the fact that interrupts only
>     occur on sends (or backward jumps) and use an assignment (bytecode)
>     right before we go into the wait primitive (which is not a real
>     send and
>     therefore not interruptable either)."
>
>     caught := false.
>     ^[
>     caught := true.
>     self wait.
>     mutuallyExcludedBlock value
>     ] ensure: [ caught ifTrue: [self signal] ]
>
>     and the Mutex>>critical:'s are
>
>     critical: aBlock
>     "Evaluate aBlock protected by the receiver."
>     <criticalSection>
>     ^self primitiveEnterCriticalSection
>     ifTrue: [aBlock value]
>     ifFalse: [aBlock ensure: [self primitiveExitCriticalSection]]
>
>     critical: aBlock ifLocked: lockedBlock
>     "Answer the evaluation of aBlock protected by the receiver.  If it
>     is already in a critical
>     section on behalf of some other process answer the evaluation of
>     lockedBlock."
>     <criticalSection>
>     ^self primitiveTestAndSetOwnershipOfCriticalSection
>     ifNil: [lockedBlock value]
>     ifNotNil:
>     [:alreadyOwner|
>     alreadyOwner
>     ifTrue: [aBlock value]
>     ifFalse: [aBlock ensure: [self primitiveExitCriticalSection]]]
>
>     primitiveEnterCriticalSection answers false if the Mutex was
>     unowned, and true if it was already owned by the active process. 
>     It blocks otherwise. primitiveTestAndSetOwnershipOfCriticalSection
>     answers false if the Mutex was unowned, true if it was already
>     owned by the active process, and nil if owned by some other process.
>
>     So we want Process>>#terminate to correctly release the semaphores
>     and mutexes no matter where in these methods they are.  We don't
>     have to worry if the process is within the block argument to a
>     critical: itself, only if it is actually within the critical:
>     method or a block within it. If it is already within the block
>     argument to critical: then Process>>#terminate's unwind handling
>     will unwind things correctly.  Taking Juan's version of
>     Semaphore>>#critical: above, the key issue is whether the process
>     being terminated is blocked on the wait, not blocked but still
>     stuck at the wait, or at the start of the block argument to ensure:.
>
>     I have extracted the processing into
>     Process>>releaseCriticalSection:, so now Process>>terminate reads
>
>     terminate
>     "Stop the process that the receiver represents forever.
>     Unwind to execute pending ensure:/ifCurtailed: blocks before
>     terminating.
>     If the process is in the middle of a critical: critical section,
>     release it properly."
>
>     | ctxt unwindBlock oldList |
>     self isActiveProcess ifTrue: [
>     ctxt := thisContext.
>     [ctxt := ctxt findNextUnwindContextUpTo: nil.
>     ctxt isNil
>     ] whileFalse: [
>     (ctxt tempAt: 2) ifNil:[
>     ctxt tempAt: 2 put: nil.
>     unwindBlock := ctxt tempAt: 1.
>     thisContext terminateTo: ctxt.
>     unwindBlock value].
>     ].
>     thisContext terminateTo: nil.
>     self suspend.
>     ] ifFalse:[
>     "Always suspend the process first so it doesn't accidentally get
>     woken up.
>     N.B. If oldList is a LinkedList then the process is runnable. If
>     it is a Semaphore/Mutex et al
>     then the process is blocked, and if it is nil then the process is
>     already suspended."
>     oldList := self suspend.
>     suspendedContext ifNotNil:
>     ["Release any method marked with the <criticalSection> pragma.
>      The argument is whether the process is runnable."
>     self releaseCriticalSection: (oldList isNil or: [oldList class ==
>     LinkedList]).
>
>     "If terminating a process halfways through an unwind, try to
>     complete that unwind block first."
>     (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
>     [:outer|
>     (suspendedContext findContextSuchThat:[:c| c closure == (outer
>     tempAt: 1)]) ifNotNil:
>     [:inner| "This is an unwind block currently under evaluation"
>     suspendedContext runUntilErrorOrReturnFrom: inner]].
>
>     ctxt := self popTo: suspendedContext bottomContext.
>     ctxt == suspendedContext bottomContext ifFalse:
>     [self debug: ctxt title: 'Unwind error during termination'].
>     "Set the context to its endPC for the benefit of isTerminated."
>     ctxt pc: ctxt endPC]]
>
>     In implementing releaseCriticalSection: we need to know which
>     selector a context has just sent.  selectorJustSentOrSelf is
>     implemented in Squeak as
>
>     InstructionStream>>selectorJustSentOrSelf
>     "If this instruction follows a send, answer the send's selector,
>     otherwise answer self."
>
>     | method |
>     method := self method.
>     ^method encoderClass selectorToSendOrItselfFor: self in: method
>     at: self previousPc
>
>     c.f.
>
>     InstructionStream>>selectorToSendOrSelf
>     "If this instruction is a send, answer the selector, otherwise
>     answer self."
>
>     | method |
>     method := self method.
>     ^method encoderClass selectorToSendOrItselfFor: self in: method at: pc
>
>     Now we can implement Process>>#releaseCriticalSection:
>
>     releaseCriticalSection: runnable
>     "Figure out if we are terminating a process that is in the ensure:
>     block of a critical section.
>     In this case, if the block has made progress, pop the
>     suspendedContext so that we leave the
>     ensure: block inside the critical: without signaling the
>     semaphore/exiting the primitive section,
>     since presumably this has already happened.  But if it hasn't made
>     progress but is beyond the
>     wait (which we can tell my the oldList being one of the runnable
>     lists, i.e. a LinkedList, not a
>     Semaphore or Mutex, et al), then the ensure: block needs to be run."
>     | selectorJustSent |
>     (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self].
>     selectorJustSent := suspendedContext selectorJustSentOrSelf.
>
>     "Receiver and/or argument blocks of ensure: in
>     Semaphore>>critical: or Mutex>>#critical:"
>     suspendedContext isClosureContext ifTrue:
>     [suspendedContext sender selector == #ensure: ifTrue:
>     [| notWaitingButMadeNoProgress |
>     "Avoid running the ensure: block twice, popping it if it has
>     already been run. If runnable
>     but at the wait, leave it in place. N.B. No need to check if the
>     block receiver of ensure: has
>     not started to run (via suspendedContext pc = suspendedContext
>     startpc) because ensure:
>     uses valueNoContextSwitch, and so there is no suspension point
>     before the wait."
>     notWaitingButMadeNoProgress :=
>     runnable
>     and: [selectorJustSent == #wait
>     and: [suspendedContext sender selectorJustSentOrSelf ==
>     #valueNoContextSwitch]].
>     notWaitingButMadeNoProgress ifFalse:
>     [suspendedContext := suspendedContext home]].
>     ^self].
>
>     "Either Semaphore>>critical: or Mutex>>#critical:.  Is the process
>     still blocked?  If so, nothing further to do."
>     runnable ifFalse: [^self].
>
>     "If still at the wait the ensure: block has not been activated, so
>     signal to restore."
>     selectorJustSent == #wait ifTrue:
>     [suspendedContext receiver signal].
>
>     "If still at the lock primitive and the lock primitive just
>     acquired ownership (indicated by it answering false)
>     then the ensure block has not been activated, so explicitly
>     primitiveExitCriticalSection to unlock."
>     (selectorJustSent == #primitiveEnterCriticalSection
>     or: [selectorJustSent ==
>     #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>     [(suspendedContext stackPtr > 0
>      and: [suspendedContext top == false]) ifTrue:
>     [suspendedContext receiver primitiveExitCriticalSection]]
>
>
>     Let's go through it line by line.  First, runnable is an argument,
>     determined in Process>>#terminate.  One could invoke it with
>
>         self releaseCriticalSection: oldList class == LinkedList
>
>     but this means that an already suspended process is assumed to be
>     not runnable, which makes it tricky to debug the
>     Process>>#terminate method.  One has to assign to oldList while
>     stepping though the method.  I've chosen safety, assuming that the
>     process is still runnable if suspend answers nil, its simply being
>     debugged.
>
>     Then we're only interested in <criticalSection> marked methods se
>     we return if there's no such pragma.
>
>     Then we deal with blocks in these methods.  One issue here is to
>     avoid running the ensure: block twice if it is already being
>     evaluated.  The other is to run it if it is stalled and has yet to
>     be run.
>
>     So if
>
>     suspendedContext isClosureContext ifTrue:
>
>     we're in the ensure: receiver or argument blocks in any
>     <criticalSection> marked method, i.e. Semaphore>>critical: and
>     Mutex>>critical:[ifLocked:].  If wait was just sent then we're in
>     the ensure: receiver block of Semaphore>>critical: (V2 & V3 above)
>     and the issue is whether the process is blocked or is unblocked
>     and has made no progress. If blocked then nothing needs to be
>     done; the ensure: block is discarded and the stack cut back to the
>     critical: activation.  If progress has been made then nothing
>     needs to be done (in fact we can't be in this state; the ensure:
>     receiver will have started evaluating the critical: block
>     argument).  If unblocked, but no progress has been made, do /not/
>     discard the unwind block and it will be run in Process>>#terminate
>     when this method returns.  Hence...
>
>     [suspendedContext sender selector == #ensure: ifTrue:
>     [| notWaitingButMadeNoProgress |
>     "Avoid running the ensure: block twice, popping it if it has
>     already been run. If runnable
>     but at the wait, leave it in place. N.B. No need to check if the
>     block receiver of ensure: has
>     not started to run (via suspendedContext pc = suspendedContext
>     startpc) because ensure:
>     uses valueNoContextSwitch, and so there is no suspension point
>     before the wait."
>     notWaitingButMadeNoProgress :=
>     runnable
>     and: [selectorJustSent == #wait
>     and: [suspendedContext sender selectorJustSentOrSelf ==
>     #valueNoContextSwitch]].
>     notWaitingButMadeNoProgress ifFalse:
>     [suspendedContext := suspendedContext home]].
>     ^self].
>
>     Now we're left with the simpler version of Semaphore>>critical:
>     (V1 above) and the two Mutex methods
>     Mutex>>#critical:[ifLocked:].  Here the only state we have to
>     worry about is that the process is unblocked but has made no
>     progress.  If not runnable the process is still blocked and we can
>     simply return.
>
>     "Either Semaphore>>critical: or Mutex>>#critical:.  Is the process
>     still blocked?  If so, nothing further to do."
>     runnable ifFalse: [^self].
>
>     If #wait was just sent the process is in Semaphore>>#critical:
>     and, because ensure: has not been sent we signal explicitly to
>     restore the signal count:
>
>     "If still at the wait the ensure: block has not been activated, so
>     signal to restore."
>     selectorJustSent == #wait ifTrue:
>     [suspendedContext receiver signal].
>
>     If either of primitiveEnterCriticalSection or
>     primitiveTestAndSetOwnershipOfCriticalSection have just been sent
>     then either the Mutex is already owned, in which case the ensure
>     block is elsewhere in the colder part of the stack, or has just
>     been owned, and because ensure: has not been sent we unlock
>     explicitly to release the Mutex:
>
>     "If still at the lock primitive and the lock primitive just
>     acquired ownership (indicated by it answering false)
>     then the ensure block has not been activated, so explicitly
>     primitiveExitCriticalSection to unlock."
>     (selectorJustSent == #primitiveEnterCriticalSection
>     or: [selectorJustSent ==
>     #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
>     [(suspendedContext stackPtr > 0
>      and: [suspendedContext top == false]) ifTrue:
>     [suspendedContext receiver primitiveExitCriticalSection]]
>
>     So Pharoers can you read and say whether you think this is sane or
>     not?  If so, then we can kibbutz to write the Pharo version.
>
>     Squeakers can you review Kernel-eem.1183 & Kernel-eem.1184 in the
>     inbox?  Kernel-eem.1183 includes the fix as described above. 
>     Kernel-eem.1184 reverts Semaphore>>#critical: to V1 above.
>
>
>     P.S. Looking at V1 above it seems to me that there is an issue if
>     the process is preempted in ensure: before sending
>     valueNoContextSwitch:.  I'll try and write a test that advances a
>     process to that precise point.  If that test fails I think we have
>     to use V2 or V3, and V2 is clearly preferable.
>
>
> Lovely.  I added the test (testSemaCriticalWaitInEnsure 
> & testMutexCriticalBlockedInEnsure) and it is V1 that works and V2 
> that does not.  I think it best not to try and be too clever and fix 
> terminate and/or releaseCriticalSection: for this case.  We can simply 
> stick with V1 for now.
>
> _,,,^..^,,,_
> best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180728/d1149ad5/attachment-0001.html>


More information about the Vm-dev mailing list