[Vm-dev] [squeak-dev] Solving termination of critical sections in the context of priority inversion was: SemaphoreTest fails in trunk, is a fix needed for the 5.2 release?
Juan Vuletich
JuanVuletich at zoho.com
Sat Jul 28 14:43:18 UTC 2018
Hi Eliot,
The V3 code in Cuis was last updated on 2010, but that was just the
comment. I went back in time, and this (except for the comment)
identical version of the code is 10/5/2007 by Andreas.
critical: mutuallyExcludedBlock
"Evaluate mutuallyExcludedBlock only if the receiver is not
currently in
the process of running the critical: message. If the receiver is,
evaluate
mutuallyExcludedBlock after the other critical: message is finished."
| blockValue caught |
caught := false.
[
caught := true.
self wait.
blockValue := mutuallyExcludedBlock value
] ensure: [caught ifTrue: [self signal]].
^blockValue
If you are interested, in Cuis, unzip /PreviousUpdates and in file list
set this filter: *-ar.cs.st;*-ar.?.cs.st;*-ar.??.cs.st
This will show all the change sets in Cuis history directly based on
Andreas' contributions. In this case it is 0104-SemaCritical-ar.cs.st
You know this stuff better than anyone, and I'm happy to follow your
lead here, Eliot.
Just a question. You are using primitives #primitiveExitCriticalSection
#primitiveEnterCriticalSection
#primitiveTestAndSetOwnershipOfCriticalSection (185, 186, 187), that
Cuis hasn't used so far. Are they supported by SqueakJS and the classic
interpreter?
Thanks,
--
Juan Vuletich
www.cuis-smalltalk.org
https://github.com/Cuis-Smalltalk/Cuis-Smalltalk-Dev
https://github.com/jvuletich
https://www.linkedin.com/in/juan-vuletich-75611b3
@JuanVuletich
On 7/27/2018 2:40 AM, Eliot Miranda wrote:
>
> Ha!
>
> On Thu, Jul 26, 2018 at 9:31 PM, Eliot Miranda
> <eliot.miranda at gmail.com <mailto:eliot.miranda at gmail.com>> wrote:
>
> Hi David, Hi Bert, Clément, Juan, Levente and Marcus, Hi Anyone
> else with strong experience in the VM with processes,
>
> On Mon, Jul 23, 2018 at 7:38 PM, David T. Lewis
> <lewis at mail.msen.com <mailto:lewis at mail.msen.com>> wrote:
>
> Semaphore seems like a rather basic thing that should work
> correctly in
> any Squeak image. The tests do not pass in trunk any more.
>
> Specifically, SemaphoreTest>>testSemaInCriticalWait fails in
> trunk, but passes
> in the earlier Squeak 4.6 / 5.0 images.
>
> Is this a real problem? Does it need to be fixed for the 5.2
> release?
>
>
> Yes. Yes. And it needs to be fixed in Pharo too. I know this
> message will strike you as TL;DR, but please, especially if you're
> Bert, Clément, Juan, Levente or Marcus, read this carefully. It's
> quite important. And below I'll present the Squeak code but will
> work with Clément and Marcus to implement semantically equivalent
> Pharo code asap.
>
> And apologies in advance for the repetitious nature of this
> message. It is better that I am precise than I am brief and
> anyone miss anything. This is an old problem and it will be nice
> if I've fixed it, but I could easily have missed something; this
> problem having been around for decades. OK...
>
>
> This is an old problem which boiled down to there being no way to
> determine by looking at a process's suspendedContext whether a
> process is either waiting on a Semaphore or Mutex or is no longer
> waiting, but has made no progress because it is at a lower
> priority than runnable processes and so has not got a chance to
> run yet.
>
> So in
> | s |
> s := Semaphore new.
> ...
> s wait
> ...
>
> if we look at the context containing the wait its pc will be the
> same whether the process is blocked, waiting on the semaphore, or
> whether the semaphore has been signalled but the process has not
> been able to proceed because it is of lower priority than runnable
> processes and so can make no progress. This caused problems for
> code such as this:
>
> Semaphore>>critical: mutuallyExcludedBlock
> self wait.
> ^mutuallyExcludedBlock ensure: [self signal]
>
> because the ensure: won't be entered if higher priority runnable
> processes are preventing it from running.
>
> And for code such as this:
>
> Semaphore>>critical: mutuallyExcludedBlock
> ^[self wait.
> mutuallyExcludedBlock value]
> ensure: [self signal]
>
> because if the process is terminated when the semaphore has not
> been signalled (i.e. the process is blocked in the wait),
> Process>>terminate will run the ensure: block anyway, resulting in
> the Semaphore getting an extra signal.
>
> This occupied Andreas and I at Qwaq, and we didn't solve it. We
> developed Mutex as a more efficient version of Monitor, but this
> is also subject so the same problem. We did change the definition
> of ensure: so that it is not a suspension point, by adding
> valueNoContextSwitch[:]
>
> BlockClosure>>ensure: aBlock
> "Evaluate a termination block after evaluating the receiver,
> regardless of
> whether the receiver's evaluation completes. N.B. This method is
> *not*
> implemented as a primitive. Primitive 198 always fails. The VM
> uses prim
> 198 in a context's method as the mark for an ensure:/ifCurtailed:
> activation."
>
> | complete returnValue |
> <primitive: 198>
> returnValue := self valueNoContextSwitch.
> complete ifNil:[
> complete := true.
> aBlock value.
> ].
> ^ returnValue
>
> This means that we don't have to deal with suspensions here
> (marked with !!!)
>
> I now understand how to distinguish between the two cases, between
> blocking and not blocked but no progress. Process>>suspend
> answers the list the Process was on when it was suspended. If the
> process is already suspended Process>>suspend answers nil. If the
> process is waiting on a Semaphore or a Mutex, Process>>suspend
> answers the Semaphore or Mutex. And if the process is runnable
> then Process>>suspend answers the process's run list (a LinkedList
> in ProcessorScheduler's quiescentProcessLists array corresponding
> to the process's priority).
>
> So Process>>#terminate can distinguish between #wait or
> #primitiveEnterCriticalSection or
> #primitiveTestAndSetOwnershipOfCriticalSection being blocked, or
> being unblocked but having made no progress due to too low a
> priority. We do so by testing the class of the result of
> suspending the process. If it is a LinkedList, the process has
> past the #wait or #primitiveEnterCriticalSection but has made no
> progress due to too low a priority.
>
> The version of Process>>#terminate I'm about to commit deals with
> several cases. Let me present the cases first. There are three
> versions of Semaphore>>#critical: to handle, and one version of
> Mutex>>critical: and Mutex>>#critical:ifLocked:.
>
> The two basic versions of Semaphore>>critical: are
>
> V1
> critical: mutuallyExcludedBlock
> "Evaluate mutuallyExcludedBlock only if the receiver is not
> currently in
> the process of running the critical: message. If the receiver is,
> evaluate
> mutuallyExcludedBlock after the other critical: message is finished."
> <criticalSection>
> self wait.
> ^mutuallyExcludedBlock ensure: [self signal]
>
> V2
> critical: mutuallyExcludedBlock
> "Evaluate mutuallyExcludedBlock only if the receiver is not
> currently in
> the process of running the critical: message. If the receiver is,
> evaluate
> mutuallyExcludedBlock after the other critical: message is finished."
> <criticalSection>
> ^[self wait.
> mutuallyExcludedBlock value]
> ensure: [self signal]
>
> and Juan's safer version is (after I added the criticalSection pragma)
>
> V3
> critical: mutuallyExcludedBlock
> "Evaluate mutuallyExcludedBlock only if the receiver is not
> currently in
> the process of running the critical: message. If the receiver is,
> evaluate
> mutuallyExcludedBlock after the other critical: message is finished."
> <criticalSection>
> | caught |
> "We need to catch eventual interruptions very carefully.
> The naive approach of just doing, e.g.,:
> self wait.
> aBlock ensure:[self signal].
> will fail if the active process gets terminated while in the wait.
> However, the equally naive:
> [self wait.
> aBlock value] ensure:[self signal].
> will fail too, since the active process may get interrupted while
> entering the ensured block and leave the semaphore signaled twice.
> To avoid both problems we make use of the fact that interrupts only
> occur on sends (or backward jumps) and use an assignment (bytecode)
> right before we go into the wait primitive (which is not a real
> send and
> therefore not interruptable either)."
>
> caught := false.
> ^[
> caught := true.
> self wait.
> mutuallyExcludedBlock value
> ] ensure: [ caught ifTrue: [self signal] ]
>
> and the Mutex>>critical:'s are
>
> critical: aBlock
> "Evaluate aBlock protected by the receiver."
> <criticalSection>
> ^self primitiveEnterCriticalSection
> ifTrue: [aBlock value]
> ifFalse: [aBlock ensure: [self primitiveExitCriticalSection]]
>
> critical: aBlock ifLocked: lockedBlock
> "Answer the evaluation of aBlock protected by the receiver. If it
> is already in a critical
> section on behalf of some other process answer the evaluation of
> lockedBlock."
> <criticalSection>
> ^self primitiveTestAndSetOwnershipOfCriticalSection
> ifNil: [lockedBlock value]
> ifNotNil:
> [:alreadyOwner|
> alreadyOwner
> ifTrue: [aBlock value]
> ifFalse: [aBlock ensure: [self primitiveExitCriticalSection]]]
>
> primitiveEnterCriticalSection answers false if the Mutex was
> unowned, and true if it was already owned by the active process.
> It blocks otherwise. primitiveTestAndSetOwnershipOfCriticalSection
> answers false if the Mutex was unowned, true if it was already
> owned by the active process, and nil if owned by some other process.
>
> So we want Process>>#terminate to correctly release the semaphores
> and mutexes no matter where in these methods they are. We don't
> have to worry if the process is within the block argument to a
> critical: itself, only if it is actually within the critical:
> method or a block within it. If it is already within the block
> argument to critical: then Process>>#terminate's unwind handling
> will unwind things correctly. Taking Juan's version of
> Semaphore>>#critical: above, the key issue is whether the process
> being terminated is blocked on the wait, not blocked but still
> stuck at the wait, or at the start of the block argument to ensure:.
>
> I have extracted the processing into
> Process>>releaseCriticalSection:, so now Process>>terminate reads
>
> terminate
> "Stop the process that the receiver represents forever.
> Unwind to execute pending ensure:/ifCurtailed: blocks before
> terminating.
> If the process is in the middle of a critical: critical section,
> release it properly."
>
> | ctxt unwindBlock oldList |
> self isActiveProcess ifTrue: [
> ctxt := thisContext.
> [ctxt := ctxt findNextUnwindContextUpTo: nil.
> ctxt isNil
> ] whileFalse: [
> (ctxt tempAt: 2) ifNil:[
> ctxt tempAt: 2 put: nil.
> unwindBlock := ctxt tempAt: 1.
> thisContext terminateTo: ctxt.
> unwindBlock value].
> ].
> thisContext terminateTo: nil.
> self suspend.
> ] ifFalse:[
> "Always suspend the process first so it doesn't accidentally get
> woken up.
> N.B. If oldList is a LinkedList then the process is runnable. If
> it is a Semaphore/Mutex et al
> then the process is blocked, and if it is nil then the process is
> already suspended."
> oldList := self suspend.
> suspendedContext ifNotNil:
> ["Release any method marked with the <criticalSection> pragma.
> The argument is whether the process is runnable."
> self releaseCriticalSection: (oldList isNil or: [oldList class ==
> LinkedList]).
>
> "If terminating a process halfways through an unwind, try to
> complete that unwind block first."
> (suspendedContext findNextUnwindContextUpTo: nil) ifNotNil:
> [:outer|
> (suspendedContext findContextSuchThat:[:c| c closure == (outer
> tempAt: 1)]) ifNotNil:
> [:inner| "This is an unwind block currently under evaluation"
> suspendedContext runUntilErrorOrReturnFrom: inner]].
>
> ctxt := self popTo: suspendedContext bottomContext.
> ctxt == suspendedContext bottomContext ifFalse:
> [self debug: ctxt title: 'Unwind error during termination'].
> "Set the context to its endPC for the benefit of isTerminated."
> ctxt pc: ctxt endPC]]
>
> In implementing releaseCriticalSection: we need to know which
> selector a context has just sent. selectorJustSentOrSelf is
> implemented in Squeak as
>
> InstructionStream>>selectorJustSentOrSelf
> "If this instruction follows a send, answer the send's selector,
> otherwise answer self."
>
> | method |
> method := self method.
> ^method encoderClass selectorToSendOrItselfFor: self in: method
> at: self previousPc
>
> c.f.
>
> InstructionStream>>selectorToSendOrSelf
> "If this instruction is a send, answer the selector, otherwise
> answer self."
>
> | method |
> method := self method.
> ^method encoderClass selectorToSendOrItselfFor: self in: method at: pc
>
> Now we can implement Process>>#releaseCriticalSection:
>
> releaseCriticalSection: runnable
> "Figure out if we are terminating a process that is in the ensure:
> block of a critical section.
> In this case, if the block has made progress, pop the
> suspendedContext so that we leave the
> ensure: block inside the critical: without signaling the
> semaphore/exiting the primitive section,
> since presumably this has already happened. But if it hasn't made
> progress but is beyond the
> wait (which we can tell my the oldList being one of the runnable
> lists, i.e. a LinkedList, not a
> Semaphore or Mutex, et al), then the ensure: block needs to be run."
> | selectorJustSent |
> (suspendedContext method pragmaAt: #criticalSection) ifNil: [^self].
> selectorJustSent := suspendedContext selectorJustSentOrSelf.
>
> "Receiver and/or argument blocks of ensure: in
> Semaphore>>critical: or Mutex>>#critical:"
> suspendedContext isClosureContext ifTrue:
> [suspendedContext sender selector == #ensure: ifTrue:
> [| notWaitingButMadeNoProgress |
> "Avoid running the ensure: block twice, popping it if it has
> already been run. If runnable
> but at the wait, leave it in place. N.B. No need to check if the
> block receiver of ensure: has
> not started to run (via suspendedContext pc = suspendedContext
> startpc) because ensure:
> uses valueNoContextSwitch, and so there is no suspension point
> before the wait."
> notWaitingButMadeNoProgress :=
> runnable
> and: [selectorJustSent == #wait
> and: [suspendedContext sender selectorJustSentOrSelf ==
> #valueNoContextSwitch]].
> notWaitingButMadeNoProgress ifFalse:
> [suspendedContext := suspendedContext home]].
> ^self].
>
> "Either Semaphore>>critical: or Mutex>>#critical:. Is the process
> still blocked? If so, nothing further to do."
> runnable ifFalse: [^self].
>
> "If still at the wait the ensure: block has not been activated, so
> signal to restore."
> selectorJustSent == #wait ifTrue:
> [suspendedContext receiver signal].
>
> "If still at the lock primitive and the lock primitive just
> acquired ownership (indicated by it answering false)
> then the ensure block has not been activated, so explicitly
> primitiveExitCriticalSection to unlock."
> (selectorJustSent == #primitiveEnterCriticalSection
> or: [selectorJustSent ==
> #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
> [(suspendedContext stackPtr > 0
> and: [suspendedContext top == false]) ifTrue:
> [suspendedContext receiver primitiveExitCriticalSection]]
>
>
> Let's go through it line by line. First, runnable is an argument,
> determined in Process>>#terminate. One could invoke it with
>
> self releaseCriticalSection: oldList class == LinkedList
>
> but this means that an already suspended process is assumed to be
> not runnable, which makes it tricky to debug the
> Process>>#terminate method. One has to assign to oldList while
> stepping though the method. I've chosen safety, assuming that the
> process is still runnable if suspend answers nil, its simply being
> debugged.
>
> Then we're only interested in <criticalSection> marked methods se
> we return if there's no such pragma.
>
> Then we deal with blocks in these methods. One issue here is to
> avoid running the ensure: block twice if it is already being
> evaluated. The other is to run it if it is stalled and has yet to
> be run.
>
> So if
>
> suspendedContext isClosureContext ifTrue:
>
> we're in the ensure: receiver or argument blocks in any
> <criticalSection> marked method, i.e. Semaphore>>critical: and
> Mutex>>critical:[ifLocked:]. If wait was just sent then we're in
> the ensure: receiver block of Semaphore>>critical: (V2 & V3 above)
> and the issue is whether the process is blocked or is unblocked
> and has made no progress. If blocked then nothing needs to be
> done; the ensure: block is discarded and the stack cut back to the
> critical: activation. If progress has been made then nothing
> needs to be done (in fact we can't be in this state; the ensure:
> receiver will have started evaluating the critical: block
> argument). If unblocked, but no progress has been made, do /not/
> discard the unwind block and it will be run in Process>>#terminate
> when this method returns. Hence...
>
> [suspendedContext sender selector == #ensure: ifTrue:
> [| notWaitingButMadeNoProgress |
> "Avoid running the ensure: block twice, popping it if it has
> already been run. If runnable
> but at the wait, leave it in place. N.B. No need to check if the
> block receiver of ensure: has
> not started to run (via suspendedContext pc = suspendedContext
> startpc) because ensure:
> uses valueNoContextSwitch, and so there is no suspension point
> before the wait."
> notWaitingButMadeNoProgress :=
> runnable
> and: [selectorJustSent == #wait
> and: [suspendedContext sender selectorJustSentOrSelf ==
> #valueNoContextSwitch]].
> notWaitingButMadeNoProgress ifFalse:
> [suspendedContext := suspendedContext home]].
> ^self].
>
> Now we're left with the simpler version of Semaphore>>critical:
> (V1 above) and the two Mutex methods
> Mutex>>#critical:[ifLocked:]. Here the only state we have to
> worry about is that the process is unblocked but has made no
> progress. If not runnable the process is still blocked and we can
> simply return.
>
> "Either Semaphore>>critical: or Mutex>>#critical:. Is the process
> still blocked? If so, nothing further to do."
> runnable ifFalse: [^self].
>
> If #wait was just sent the process is in Semaphore>>#critical:
> and, because ensure: has not been sent we signal explicitly to
> restore the signal count:
>
> "If still at the wait the ensure: block has not been activated, so
> signal to restore."
> selectorJustSent == #wait ifTrue:
> [suspendedContext receiver signal].
>
> If either of primitiveEnterCriticalSection or
> primitiveTestAndSetOwnershipOfCriticalSection have just been sent
> then either the Mutex is already owned, in which case the ensure
> block is elsewhere in the colder part of the stack, or has just
> been owned, and because ensure: has not been sent we unlock
> explicitly to release the Mutex:
>
> "If still at the lock primitive and the lock primitive just
> acquired ownership (indicated by it answering false)
> then the ensure block has not been activated, so explicitly
> primitiveExitCriticalSection to unlock."
> (selectorJustSent == #primitiveEnterCriticalSection
> or: [selectorJustSent ==
> #primitiveTestAndSetOwnershipOfCriticalSection]) ifTrue:
> [(suspendedContext stackPtr > 0
> and: [suspendedContext top == false]) ifTrue:
> [suspendedContext receiver primitiveExitCriticalSection]]
>
> So Pharoers can you read and say whether you think this is sane or
> not? If so, then we can kibbutz to write the Pharo version.
>
> Squeakers can you review Kernel-eem.1183 & Kernel-eem.1184 in the
> inbox? Kernel-eem.1183 includes the fix as described above.
> Kernel-eem.1184 reverts Semaphore>>#critical: to V1 above.
>
>
> P.S. Looking at V1 above it seems to me that there is an issue if
> the process is preempted in ensure: before sending
> valueNoContextSwitch:. I'll try and write a test that advances a
> process to that precise point. If that test fails I think we have
> to use V2 or V3, and V2 is clearly preferable.
>
>
> Lovely. I added the test (testSemaCriticalWaitInEnsure
> & testMutexCriticalBlockedInEnsure) and it is V1 that works and V2
> that does not. I think it best not to try and be too clever and fix
> terminate and/or releaseCriticalSection: for this case. We can simply
> stick with V1 for now.
>
> _,,,^..^,,,_
> best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20180728/d1149ad5/attachment-0001.html>
More information about the Vm-dev
mailing list