[BUG] {almost-fix} Semaphore>>critical: is broken

Mark van Gulik ghoul6 at home.net
Thu Oct 12 07:16:44 UTC 2000


> Craig Latta pointed out a much simpler way to do it:
> 
> !Semaphore methodsFor: 'mutual exclusion' stamp: 'pnm
> 10/11/2000 18:53'!
> critical: mutuallyExcludedBlock
>  "Evaluate mutuallyExcludedBlock only if the receiver
> is not currently in
>  the process of running the critical: message. If the
> receiver is, evaluate
>  mutuallyExcludedBlock after the other critical:
> message is finished."
>
>  self wait.
>  ^mutuallyExcludedBlock
>   ensure: [self signal]! !

Hey, you wouldn't want to introduce the same bug that VisualWorks has, would
you?  I just patched that one a couple of weeks ago.

I'm not sure about Squeak, but in VisualWorks when you terminate a process,
it is interrupted with a new context that raises a termination exception.
The outermost context catches this (allowing necessary unwind actions to
occur along the way).  Except that Semaphore>>critical: looks a lot like the
code you posted, and *doesn't work*.

If the process gets preempted after the #wait but before invoking #ensure:,
you'll be in a situation in which the semaphore was acquired, but there is
no corresponding guarantee that it will be released.  A pretty simple stress
test easily makes this mechanism fail (run a bunch of processes that all
invoke #critical: on the same semaphore repeatedly in a loop, and start
killing these processes until the semaphore gets stuck).

My fix in VisualWorks is not pretty, but it works - it uses a local variable
as a flag, and makes some detailed assumptions about which inter-bytecode
positions allow interrupts to happen.  Squeak is interpreted (in some VM's)
rather than always compiled to native, so this sneaky trick is unavailable.

The alternative is to redesign the wait primitive.  Let's say the wait
primitive is changed to take an array of size one as an argument.  The
primitive immediately sets the sole entry of the array to false.  If the
wait is successful, the array's single entry will be set to true.  The new
implementation of #critical: would look like:

critical: mutuallyExcludedBlock

    | a result |
    a := Array with: false.
    [
        self waitAndSet: a.
        result := mutuallyExcludedBlock value.
    ] ensure: [
        a first ifTrue: [
            self signal].
    ].
    ^result

Note that you can't simply make wait *return* true when/if it succeeds,
because this value might just barely make it onto the stack, and not even be
safely assignable to a variable.  If a statement like "a := self wait" is
executed, this is a send of #wait, followed by a bytecode to do the
assignment.  The process can be interrupted between these steps, leading to
a "stuck" semaphore.

An alternative design is to have the #terminate operation check the victim
process's current context to see if it's an invocation of #critical:.  A
quick analysis of the possible program counter locations determines whether
the semaphore needs to be released or not.  Obviously this type of solution
can lead to all sorts of nasty couplings between unrelated concepts, so it
should be avoided if possible.  I prefer the previous solution in which a
#waitAndSet: primitive is added.

Any takers?  (I don't really Squeak much myself, I'm more of a lurker).

-Mark





More information about the Squeak-dev mailing list