Replacing Semaphores with Mesa Monitors
Andrew P. Black
black at cse.ogi.edu
Wed May 12 22:36:27 UTC 2004
I would like to propose a change to the core synchronization
construct used in Squeak. I believe that we should switch from the
Semaphore to the Mesa-style monitor with condition variables.
Note that I'm not just proposing that we _add_ Mesa-style monitors to
the existing bevy of synchronization constructs: I could do that
without getting any consensus from the community, and indeed we have
implemented Mesa monitors in terms of Semaphores. I'm proposing that
we remove Semaphores entirely, and use Mesa-style monitors instead.
The basic reason for this proposal is that Semaphores are hard to
use, and likely to be a source of synchronization bugs. The idea of
a Semaphore with a time-out is not very well defined: what is
supposed to be true after one has been released from such a
Semaphore? Mesa-style monitors were developed by the folks at PARC
and later at Digital SRC who developed Mesa, Modula-2+ and finally
Modula-3, and have stood the test of time. They are more expressive
than semaphores, since they tie together the protected variables and
the conditions for which processes are waiting, and they open the way
for both compile-time and run-time checks.
What is a Mesa-style monitor? Most of us are probably familiar with
a Hoare-style monitor, in which a group of variables are protected
from concurrent access by a monitor lock. This is like a Java object
in which all of the public methods are declared to be synchronized.
I'm not at this point proposing any language changes, so it would be
up to the programmer to put the bodies of the synchronized methods
inside a monitor critical: [ block ], as is presently done with
semaphores.
Once inside a monitor, what does a process do when it finds that it
needs some resource that is not presently available? It waits on a
so-called condition variable, which is logically associated with the
boolean condition for which it is waiting, such as data being present
on a socket, a file being free, or a queue being non-empty or
non-full. The difference between Hoare and Mesa-style monitors is in
the way that condition variables are handled.
In both styles of Monitor, waiting on a condition variable entails
_atomically_ releasing the monitor lock and suspending the current
process. (This can be easily implemented by a new VM primitive, or
by using a counter and a semaphore) . The main difference between
the two styles lies in the semantics of a signal on a condition
variable. In the Hoare formulation, a signal can only be performed
by a process that itself has the monitor lock, and which can
therefore guarantee that the condition for which the waiting process
is waiting is now true. If there is no process waiting, the signal
does nothing, but if there is a waiter (or several), the semantics of
signal is to suspend the signaller and transfer control to the
waiter(or one of the waiters). In the Mesa formulation, signal is
a _hint_: the signaller says that something may have changed, and the
responsibility falls on the waiting process to check whether the
condition for which it is waiting is satisfied, and if not, to wait
again. This may seem like a step backwards, but in fact it
significantly simplifies the implementation, because it is always
correct to signal a condition - it's a bit like a "self changed"
message in the Observer pattern. This means that the signaller does
not have to hold the monitor lock, and that interrupt handlers and
hardware can therefore signal condition variables directly. It is
this property that makes it possible to use Mesa monitors as the
_only_ synchronization construct.
The tradeoff is that the condition must always be re-checked after a
wait. The correct usage pattern is
[ condition ] whileFalse: [ conditionVariable wait ]
and not (condition) ifFalse: [ conditionVariable wait ].
Chuan-Kai Lin (a student here at OGI) and I have developed an
implementation of Monitors in terms of semaphores (which is sometimes
a bit convoluted, to avoid race conditions). We have some simple
tests, but the implementation needs further testing and, more
importantly, careful scrutiny by many eyes, before I would be
confident that we have avoided lost signals and race conditions.
However, I would be much happier with an implementation that bypasses
Semaphores altogether, and has the kernel primitives operate directly
on condition variables. This would not only be more efficient (not
an insignificant consideration when one is talking about interrupt
handling), but also simpler and therefore less likely to contain
hidden bugs.
My first attempts to achieve this involved subclassing Semaphore to
create classes for monitor condition variables and monitor locks, but
carefully leaving the layout of the instance variables the same so
that the existing Seamphore primitives would still work.
Nevertheless, the primitives failed. Tim Rowledge tells me that this
is because there is an explicit test of the class of the object (=
Semaphore) in the implementation of <primitive: 86>.
So, in order to make progress on this, we need to either modify the
implementation of the Semaphore primitives to remove this check, or
to create some new (unchecked) primitives. I've never played with
kernel primitives before, and have no idea even how to start looking
at doing this.
OK, I'm almost finished. The final fly in the ointment is what to do
when a waiting thread shouldn't wait any more. Perhaps it is waiting
for data on a socket, and the socket was closed. Or perhaps it was
waiting for user input, and the user just hit the interrupt key. How
does one deal with this?
The "pure" answer is: in the usual way. You have to write your
conditions so that they say "data available OR socket closed", etc.
This is obviously correct, and often quite impractical.
The answer adopted by current Squeak code is to use timeouts on the
wait. The main problem with this (apart from implementing it
correctly and efficiently; see Julian Fitzell's
SemaphoreWaitAndWaitTimeoutFix-jf from 11 Sept 2003) is that the
semantics of the wait message are no longer clear. Even with the
Hoare monitor, the meaning of being awoken is no longer that the
condition for which you were waiting is satisfied: it might just be
that a timer has gone off. So, the awoken process has to check
what has happened explicitly - and presumably going back to wait
again is not the desired behavior if a timeout has occurred and the
condition is still false.
The Modula-3 libraries solve this problem by introducing alerts,
which are sent to processes. The receiving process sees an "alerted"
exception, or a flag set. If a process is waiting on a condition
that might be false for a long time (like user input), the wait
should be an "alertWait", which means that if the process is alerted,
the wait will be terminated. Once the process is running again,
inside the monitor, it will experience an alerted exception: it will
not execute the normal code path. I'm not a great fan of
exceptions, so in Squeak I would instead suggest a method like:
aCondition waitIfAlerted: [ aBlock ]
which will terminate normally if the condition is signalled, and will
evaluate aBlock if it is alerted.
So the question for discussion on the list is: is this cleanup worth
doing? Is there a volunteer to work on the kernel primitives?
To help the discussion, I've assembled some resources on the Swiki:
see http://minnow.cc.gatech.edu/squeak/Monitors. The prototype
implementation is there, as are links to various documents that
provide more background than there is room for here.
Andrew
More information about the Squeak-dev
mailing list
|