[squeak-dev] Process #suspend / #resume semantics

mail at jaromir.net mail at jaromir.net
Fri Dec 31 19:09:36 UTC 2021


Hi Eliot,

do I understand well that the new suspend should never answer the conditional variable's list (Semaphore/Mutex) but only nil (for active or previously suspended processes) or a run queue (for blocked or runnable but not running processes)??

If primitiveSuspend backs up a blocked process before the last send it's actually equivalent to backing up one instruction and then suspending, i.e. suspending as if while still in the run queue - is this thought experiment correct or am I totally confused? 

Or are there suspension points that may stop the suspend in the middle etc. - this is completely out of my league... sorry, just an idea.

If that's right we should be able to simplify the #releaseCriticalSection method because the suspended process would always be "runnable" and #terminate wouldn't need the oldList any longer.

Thanks for your time,

Happy New Year!


~~~
^[^    Jaromir

Sent from Squeak Inbox Talk

On 2021-12-30T11:24:16-08:00, eliot.miranda at gmail.com wrote:

> Hi Jaromir, Hi Craig, Hi All,
> 
> On Tue, Dec 28, 2021 at 3:32 PM Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> 
> >
> >
> > On Tue, Dec 28, 2021 at 2:15 PM Eliot Miranda <eliot.miranda at gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Tue, Dec 28, 2021 at 1:53 PM <mail at jaromir.net> wrote:
> >>
> >>> Hi Eliot, all,
> >>>
> >>> this example shows Mutex's critical section can be entered multiple
> >>> times:
> >>>
> >>
> >> I know.  suspend is broken.  Please read my previous message fully and
> >> carefully.  If I implement the second alternative then the example works
> >> correctly.  In the siulator I get:
> >>
> >> {a Semaphore(a Process(75019) in [] in [] in UndefinedObject>>DoIt) .
> >>  a Mutex(a Process(59775) in Mutex>>critical:) .
> >>  a Process(75019) in [] in [] in UndefinedObject>>DoIt . false .
> >>  a Process(59775) in Mutex>>critical: . false}
> >>
> >
> > However, this comes at the cost of finding that the new terminate is
> > broken.  If suspend does not remove a process from its list then a process
> > terminated while waiting on a Semaphore remains on that semaphore.  So
> > terminate must not only ensure that any critical sections are released, but
> > that the process is removed from its list, if that list is a condition
> > variable.  IMO this should happen very early on in terminate.  I'm running
> > the second alternative but I have to filter out unrunnable processes in
> > signal et al since suspend no longer removes from the list.
> >
> 
> Having thought about it for a couple of days I now think that the second
> alternative is the only rational approach.  This is that suspend always
> removes a process from whatever list it is on, but if the list is not its
> run queue, the process is backed up one bytecode to the send that invoked
> the wait.  Hence if the process resumes it immediately progresses into the
> wait again, leaving it exactly where it was if it hadn't been suspended.
> 
> Craig I hear your concern, but being able to (especially accidentally)
> suspend a process and resume it and find it has progressed beyond whatever
> condition variable it was waiting on is entirely unacceptable.
> 
> My first alternative, that suspend does not remove a process from its list
> if a condition variable, breaks the existing code base.  For example a
> typical pattern at start up is to suspend the old version of a process
> waiting on a semaphore (e.g. the finalizationProcess) and start up a new
> one.  The first alternative leaves the old process waiting on the semaphore.
> 
> Digression: VisualWorks also implements the second choice, but it isn't
> obvious.  HPS, the VisualWorks VM, can only run jitted code; it has no
> interpreter.  So backing up the pc to before the send is really difficult
> to implement; it introducers arbitrarily many suspension points since the
> send of the wait/enterCriticalSection et al can be preceded by an arbitrary
> expression.  Instead, one additional suspension point is introduced,
> complementing after a send, at a backward jump, and at method entry (after
> frame build, before executing the first bytecode).  Here primitives can be
> in one of two states, uncommitted, and committed.  Uncommitted primitives
> after primitives in progress.  One can't actually see this state.  A
> committed primitive has a frame/context allocated for its execution.  A
> committed primitive may have completed (e.g. an FFI call is in progress,
> waiting for a result) or is yet to start.  So HPS can back up a committed
> primitive wait to the committed but uncompleted state. Hence
> resuming reenters the wait state.  This is ok, but complex.
> 
> In Cog we have an interpreter and can easily convert a machine3 code frame
> in to an interpreted frame, so backing up the process to the send that
> invoked the wait/enterCriticalSection etc is fine.  I'll have a go at this
> asap.
> 
> 
> >
> >
> >>>     | s p1 p2 p3 m |
> >>>     s := Semaphore new.
> >>>     m := Mutex new.
> >>>     p1 := [m critical: [s wait]] newProcess.
> >>>     p1 resume.
> >>>     p2 := [m critical: [s wait]] newProcess.
> >>>     p2 resume.
> >>>     Processor yield.
> >>>     { p1. p1 suspend. p2. p2 suspend }.
> >>>     p1 resume. p2 resume.
> >>>     Processor yield.
> >>>     { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
> >>> instVarNamed: 'owner' }.
> >>>     p3 := [m critical: [s wait]] newProcess.
> >>>     p3 resume.
> >>>     Processor yield.
> >>>     { s. m. p1. p1 isTerminated. p2. p2 isBlocked. p3. p3 isBlocked. m
> >>> isOwned. m instVarNamed: 'owner' }.
> >>>
> >>> I've just added a third process to your last example; p3 really enters
> >>> the critical section and takes m's ownership despite the fact p2 is already
> >>> waiting inside m's  critical section - because p2 managed to enter m
> >>> withour taking m's ownership.
> >>>
> >>> Now we could repeat the procedure and keep adding processes inside the
> >>> critical section indefinitely :) So I guess this really is a bug.
> >>>
> >>> Best,
> >>>
> >>> ~~~
> >>> ^[^    Jaromir
> >>>
> >>> Sent from Squeak Inbox Talk
> >>>
> >>> On 2021-12-28T20:07:25+01:00, mail at jaromir.net wrote:
> >>>
> >>> > Hi Eliot,
> >>> >
> >>> > Thanks! Please see my comments below, it seems to me there may be a
> >>> bug in the Mutex.
> >>> >
> >>> > ~~~
> >>> > ^[^    Jaromir
> >>> >
> >>> > Sent from Squeak Inbox Talk
> >>> >
> >>> > On 2021-12-27T14:55:22-08:00, eliot.miranda at gmail.com wrote:
> >>> >
> >>> > > Hi Jaromir,
> >>> > >
> >>> > > On Mon, Dec 27, 2021 at 2:52 AM <mail at jaromir.net> wrote:
> >>> > >
> >>> > > > Hi all,
> >>> > > >
> >>> > > > What is the desirable semantics of resuming a previously suspended
> >>> process?
> >>> > > >
> >>> > >
> >>> > > That a process continue exactly as it had if it had not been
> >>> suspended in
> >>> > > the first place.  In this regard our suspend is hopelessly broken for
> >>> > > processes that are waiting on condition variables. See below.
> >>> > >
> >>> > >
> >>> > > >
> >>> > > > #resume's comment says: "Allow the process that the receiver
> >>> represents to
> >>> > > > continue. Put the receiver in *line to become the activeProcess*."
> >>> > > >
> >>> > > > The side-effect of this is that a terminating process can get
> >>> resumed
> >>> > > > (unless suspendedContext is set to nil - see test
> >>> KernelTests-jar.417 /
> >>> > > > Inbox - which has the unfortunate side-effect of #isTerminated
> >>> answer true
> >>> > > > during termination).
> >>> > > >
> >>> > >
> >>> > > But a process that is terminating should not be resumable.  This
> >>> should be
> >>> > > a non-issue.  If a process is terminating itself then it is the
> >>> active
> >>> > > process, it has nil as its suspendedContext, and Processor
> >>> > > activeProcess resume always produces an error.. Any process that is
> >>> not
> >>> > > terminating itself can be made to fail by having the machinery set
> >>> the
> >>> > > suspendedContext to nil.
> >>> > >
> >>> >
> >>> > Yes agreed, but unfortunately that's precisely what is not happening
> >>> in the current and previous #terminate and what I'm proposing in
> >>> Kernel-jar.1437 - to set the suspendedContext to nil during termination,
> >>> even before calling #releaseCriticalSection.
> >>> >
> >>> > >
> >>> > > > A similar side-effect: a process originally waiting on a semaphore
> >>> and
> >>> > > > then suspended can be resumed into the runnable state and get
> >>> scheduled,
> >>> > > > effectively escaping the semaphore wait.
> >>> > > >
> >>> > >
> >>> > > Right,  This is the bug.  So for example
> >>> > >     | s p |
> >>> > >     s *:=* Semaphore new.
> >>> > >     p *:=* [s wait] newProcess.
> >>> > >     p resume.
> >>> > >     Processor yield.
> >>> > >     { p. p suspend }
> >>> > >
> >>> > > answers an Array of process p that is past the wait, and the
> >>> semaphore, s.
> >>> > > And
> >>> > >
> >>> > >     | s p |
> >>> > >     s *:=* Semaphore new.
> >>> > >     p *:=* [s wait] newProcess.
> >>> > >     p resume.
> >>> > >     Processor yield.
> >>> > >     p suspend; resume.
> >>> > >     Processor yield.
> >>> > >     p isTerminated
> >>> > >
> >>> > > answers true, whereas in both cases the process should remain
> >>> waiting on
> >>> > > the semaphore.
> >>> > >
> >>> > > >
> >>> > > > Is this an expected behavior or a bug?
> >>> > > >
> >>> > >
> >>> > > IMO it is a dreadful bug.
> >>> > >
> >>> > > > If a bug, should a suspended process somehow remember its previous
> >>> state
> >>> > > > and/or queue and return to the same one if resumed?
> >>> > > >
> >>> > >
> >>> > > IMO the primitive should back up the process to the
> >>> > > wait/primitiveEnterCriticalSection. This is trivial to implement in
> >>> the
> >>> > > image, but is potentially non-atomic.  It is perhaps tricky to
> >>> implement in
> >>> > > the VM, but will be atomic.
> >>> > >
> >>> > > Sorry if I'm missing something :)
> >>> > > >
> >>> > >
> >>> > > You're not missing anything :-)  Here's another example that answers
> >>> two
> >>> > > processes which should both block but if resumed both make progress.
> >>> > >
> >>> > >     | s p1 p2 m |
> >>> > >     s *:=* Semaphore new.
> >>> > >     m *:=* Mutex new.
> >>> > >     p1 *:=* [m critical: [s wait]] newProcess.
> >>> > >     p1 resume.
> >>> > >     p2 *:=* [m critical: [s wait]] newProcess.
> >>> > >     p2 resume.
> >>> > >     Processor yield.
> >>> > >     { p1. p1 suspend. p2. p2 suspend }
> >>> > >
> >>> > > p1 enters the mutex's critical section, becoming the mutex's owner.
> >>> p2 then
> >>> > > blocks attempting to enter m's critical section.  Let's resume these
> >>> two,
> >>> > > and examine the semaphore and mutex:
> >>> > >
> >>> > >     | s p1 p2 m |
> >>> > >     s *:=* Semaphore new.
> >>> > >     m *:=* Mutex new.
> >>> > >     p1 *:=* [m critical: [s wait]] newProcess.
> >>> > >     p1 resume.
> >>> > >     p2 *:=* [m critical: [s wait]] newProcess.
> >>> > >     p2 resume.
> >>> > >     Processor yield.
> >>> > >     { p1. p1 suspend. p2. p2 suspend }.
> >>> > >     p1 resume. p2 resume.
> >>> > >     Processor yield.
> >>> > >     { s. m. p1. p1 isTerminated. p2. p2 isTerminated }
> >>> > >
> >>> > > In this case the end result for p2 is accidentally correct. It ends
> >>> up
> >>> > > waiting on s within m's critical section. But p1 ends up
> >>> terminated.  IMO
> >>> > > the correct result is that p1 remains waiting on s, and is still the
> >>> owner
> >>> > > of m, and p2 remains blocked trying to take ownership of m.
> >>> > >
> >>> >
> >>> > Perfect example! My naive expectation was when a process inside a
> >>> critical section gets suspended the Mutex gets unlocked but that's
> >>> apparently wrong :)
> >>> >
> >>> > But still, there's something wrong with the example: If p1 resumes it
> >>> releases m's ownership and terminates, then p2 takes over and proceeds
> >>> inside the critical section and gets blocked at the semaphore. I'd expect
> >>> p2 would become the owner of the Mutex m BUT it's not! There's no owner
> >>> while p2 is sitting at the semaphore. Try:
> >>> >
> >>> >     | s p1 p2 m |
> >>> >     s := Semaphore new.
> >>> >     m := Mutex new.
> >>> >     p1 := [m critical: [s wait]] newProcess.
> >>> >     p1 resume.
> >>> >     p2 := [m critical: [s wait]] newProcess.
> >>> >     p2 resume.
> >>> >     Processor yield.
> >>> >     { p1. p1 suspend. p2. p2 suspend }.
> >>> >     p1 resume. p2 resume.
> >>> >     Processor yield.
> >>> >     { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
> >>> instVarNamed: 'owner' }
> >>> >
> >>> > It seems to me that when p2 gets suspended it is stopped somewhere
> >>> inside #primitiveEnterCriticalSection before the owner is set and when it
> >>> gets resumed it is placed into the runnable queue with the pc pointing
> >>> right behind the primitive and so when it runs it just continues inside
> >>> #critical and get blocked at the semaphore, all without having the
> >>> ownership.
> >>> >
> >>> > Is this interpretation right? It would mean Mutex's critical section
> >>> can be entered twice via this mechanism...
> >>> >
> >>> > Cuis does set the ownership to p2 in this example.
> >>> >
> >>> > Thanks again,
> >>> >
> >>> > Jaromir
> >>> > >
> >>> > > >
> >>> > > > Best,
> >>> > > > ~~~
> >>> > > > ^[^    Jaromir
> >>> > > >
> >>> > > > Sent from Squeak Inbox Talk
> >>> > > >
> >>> > >
> >>> > > _,,,^..^,,,_
> >>> > > best, Eliot
> >>> > > -------------- next part --------------
> >>> > > An HTML attachment was scrubbed...
> >>> > > URL: <
> >>> http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211227/8719df13/attachment.html
> >>> >
> >>> > >
> >>> > >
> >>> >
> >>> >
> >>>
> >>
> >>
> >> --
> >> _,,,^..^,,,_
> >> best, Eliot
> >>
> >
> >
> > --
> > _,,,^..^,,,_
> > best, Eliot
> >
> 
> 
> -- 
> _,,,^..^,,,_
> best, Eliot
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211230/060f87d7/attachment.html>
> 
> 


More information about the Squeak-dev mailing list