[squeak-dev] Re: Suspending process fix

Eliot Miranda eliot.miranda at gmail.com
Tue Apr 28 23:45:12 UTC 2009


Hi Michael,
   responding to clear-up some apparent confusions.


On Tue, Apr 28, 2009 at 3:29 PM, Michael van der Gulik <mikevdg at gmail.com>wrote:

> On 4/28/09, Andreas Raab <andreas.raab at gmx.de> wrote:
> > Igor Stasenko wrote:
> >> Ask yourself, why a developer, who may want to suspend any process
> >> (regardless of his intents) to resume it later, should make any
> >> assertions like "what will be broken if i suspend it?".
> >
> > Thus my question about use cases. I haven't seen many uses of suspend
> > outside of the debugger. And I don't think that's by accident - suspend
> > is very tricky to deal with in a realistic setting that needs to deal
> > with asynchronous signals. Most of the time it is a last resort solution
> > (i.e., don't care too much about what happens afterwards) not something
> > that you would do casually and expect to be side-effect free.
>
> IMHO Process>>suspend should never by used in "normal" code. It should
> only be used from debuggers and system tools. But it should still be
> implemented to exhibit the correct behaviour.


Define "normal" :)


> The above is exactly the use case where this bug has bitten me. When I
> was trying to debug concurrent code, the debugger would simply ignore
> Semaphore>>wait and step right over it! The debugger quickly became
> useless when I had to manually keep track of which semaphores were
> signalled and which weren't.
>
> Of course, the debugger would also need improvement to make sure that
> it doesn't suspend the entire GUI every time its simulated process
> waits on a semaphore, but that's another issue.
>
> > The problem is that in a "real" environment signals are asynchronous.
> > Unless you have some way of stopping time and other external interrupts
> > at the same time you simply cannot guarantee that after the #suspend
> > there isn't an external signal which causes some other process waiting
> > on the semaphore to execute before the process that "ought" to be
> released.
>
> By my understanding of how suspending a process should work, if a
> Process is suspended (by calling >>suspend) then no force on Earth
> other than called >>resume on it should resume it again. Any events or
> signals on it should accumulate until it is resumed.


Signals accumulate on Semaphores, not on processes.


> > For example, just consider a mutex where for some reason ordering
> > matters like in Tweak (which does break if processes are not put back in
> > the same order in which they were taken off the list): You have a
> > process which holds the mutex, two more are waiting. You send #suspend
> > to the first (waiting) one, it is off-list. Now the current mutex owner
> > leaves that mutex. What should happen? Should the entire mutex stall
> > because the process that was supposed to go next was suspended? If it
> > proceeds it changes the ordering and that can cause all sorts of
> > problems (as I found out when testing some earlier versions of the
> > semaphore fixes that weren't quite correct ;-)
>
> What list? Are you referring to the linked list that a Semaphore
> maintains? I would consider the linked list of Processes that
> Semaphores maintain to be an implementation detail of Processes and
> Semaphores. Your code should be written to be completely oblivious to
> it.


This is more than an implementation detail.  Processes are subclasses of
Link that add a "myList" instance variable, and Semaphores subclasses of
LinkedList.  The runnable process lists in Processor are LinkedList
instances.  So a process is either the sole activeProcess and not on any
list, or suspended, and not on any list, or on one of the runnable process
lists, or waiting on a Semaphore.  The "myList" instance variable is set to
the list a process is waiting on or not.  So a process is suspended iff
myList == nil and the process ~~ Processor activeProcess.  None of Processor
(the access of the value of (Smalltalk associationAt: #Process),
activeProcess (an inst var accessor) and #== or #and: are suspension points
so one can write
    Process isSuspended
        ^myList == nil and: [self ~~ Processor activeProcess]

But wait!?!?  The Squeak definition is simply
Process methods for accessing
isSuspended
 ^myList isNil

which is broken.  Take for example the following two forks, the first of
which gets added to the runnable process lists before it runs, the second of
which runs immediately:

| s r | s := Semaphore new. [| thisProcess | thisProcess := Processor
activeProcess. r := {(Processor instVarNamed: 'quiescentProcessLists')
indexOf: (thisProcess instVarNamed: 'myList'). thisProcess priority.
thisProcess isSuspended }. s signal] forkAt: Processor
userBackgroundPriority. s wait. r answers #(1 30 false) | s r | s :=
Semaphore new. [| thisProcess | thisProcess := Processor activeProcess. r :=
{(Processor instVarNamed: 'quiescentProcessLists') indexOf: (thisProcess
instVarNamed: 'myList'). thisProcess priority. thisProcess isSuspended }. s
signal] forkAt: Processor activePriority + 20. s wait. r answers #(0 60
true)

So I'm confused too!  There's a bug in the VM and in the definition of
isSuspended.  isSuspended needs to read

    isSuspended
        ^myList == nil and: [self ~~ Processor activeProcess]

and the VM needs to set myList to nil in transferTo:.

If I were thinking of a truly multi-process implementation I'd set myList to
the process itself so that we could define


    isSuspended
        ^myList == nil

    isActive
        ^myList == self

but one step at a timee.


I believe the correct behaviour of Semaphore>>signal should be that
> the next process to be run would be either the process doing the
> signalling, or any other process waiting on that semaphore. Assuming a
> multi-core capable VM, two processes might end up concurrrently
> continuing execution. There shouldn't be any guaranteed ordering in
> the resuming of processes; that's an implementation detail in the VM
> that could potentially change.


Um, this seems very confused.  The process doing the signalling should not
be affected at all.  Only processes waiting on the semaphore should be
affected, and by definition the signalling process can't be waiting on the
semapihre it seignals because it has to be runnable to be able to signal.

Further, yes there *must* be an ordering in the resumption order.  It must
be strictly FIFO for many scheduling algorithms to work.  Even on a
multi-core CPU the scheduler can be well-defined such that processes waiting
on a semaphore become active in the order they are waiting on the semaphore.
 Even in a hypothetical multi-core VM with two concurrent processes
simultaneously signalling a semaphore with two processes waiting on it the
system would have to ensure that the two signals ended up scheduling both
processes, not that both signals ended up somehow proceeding only one; i.e.
the VM is going to have to serialize or mutually-exclude access to the
semaphore to prevent signals being lost.

And yes, this is guaranteed, very conciously so.  The Smalltalk scheduler is
strictly a real-time scheduler preemptive across priority
and cooperatively-scheduled round-robin within priority, such that all
lower-priority runnable processes are preempted by any higher-priority
runnable processes as soon as any higher-priority process becomes runnable.
 The only bug in this in Squeak (apart from the myList snafu above) is that
when a lower-priority process is preempted by a higher one it gets added to
the back of its scheduling queue, so within a priority processes can't rely
on being co-operatively scheduled.  This has been fixed in VisualWorks,
where a preempted process is added to the front of its runnable process
list.  (But VisualWorks goes on to break Semaphores by making them priority
queues where the first highest priority process is the one that gets run,
which will break certain scheduling algorithms).


And yes, I believe that in your example, it is correct that the entire
> mutex should "stall", meaning that the process that entered the mutex
> has entered a "suspended" state and all processes still waiting on
> that mutex remain in their "waiting" state. If the mutex didn't
> "stall", the debugger wouldn't be particularly helpful.
>
> Gulik.
>
> --
> http://gulik.pbwiki.com/
>
>
Cheers
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20090428/4fa9fcf5/attachment.htm


More information about the Squeak-dev mailing list