[squeak-dev] Process #suspend / #resume semantics

mail at jaromir.net mail at jaromir.net
Wed Jan 5 12:54:18 UTC 2022


Hi Eliot, all,

I'm enclosing a fixed version of DelayWaitTimeout >> signalWaitingProcess which was (ab)using the old 'process suspend; resume' bug/feature; hope it's right.

Also enclosed is a fix for testAtomicSuspend and a new test documenting expectations for the old resume versus the new revised resume semantics.

What remains is Process >> signalException: which you've fixed.

Thanks!

best,
~~~
^[^    Jaromir

Sent from Squeak Inbox Talk

On 2022-01-01T12:39:17+01:00, mail at jaromir.net wrote:

> Hi Eliot,
> 
> 
> On 2021-12-31T12:24:11-08:00, eliot.miranda at gmail.com wrote:
> 
> > Hi Jaromir,
> > 
> > On Fri, Dec 31, 2021 at 11:11 AM <mail at jaromir.net> wrote:
> > 
> > > Hi Eliot,
> > >
> > > do I understand well that the new suspend should never answer the
> > > conditional variable's list (Semaphore/Mutex) but only nil (for active or
> > > previously suspended processes) or a run queue (for blocked or runnable but
> > > not running processes)??
> > >
> > 
> > That's a good question.  So far I have primitiveSuspend always answering
> > the list.  But it is very easy to change.  Do you have a string preference
> > either way?  Leaving it the way it was (always returning the list) even if
> > it still backs up the process to the point of send (of the message that
> > blocks) seems to me to have the least impact on existing uses.
> > 
> 
> On the contrary maybe; theoretically I'd expect some code to possibly fail because if suspend backs up the process, it's no longer "in the wait" but the programmer may have counted on it to be, based on the answered list; e.g. #releaseCriticalSection assigns 'suspendedContext home' to suspendedContext if the process is "in the wait" (it may still work though, I don't know yet but it surely would need to be doublechecked).
> 
> I'd say answering nil or a run list seems more consistent to me because it reflects the actual state the process would normally be before the send (i.e. active or runnable). Answering the semaphore/mutex list sounds a bit inconsistent and confusing to me, with somewhat convoluted logic why.
> 
> > 
> > > If primitiveSuspend backs up a blocked process before the last send it's
> > > actually equivalent to backing up one instruction and then suspending, i.e.
> > > suspending as if while still in the run queue - is this thought experiment
> > > correct or am I totally confused?
> > >
> > 
> > No, that's exactly right.
> > 
> > 
> > > Or are there suspension points that may stop the suspend in the middle
> > > etc. - this is completely out of my league... sorry, just an idea.
> > >
> > 
> > Well, that's how VisualWorks/HPS does it with its idea of
> > committed primitives (see my previous message).  But I find what I'm doing
> > above easier to understand, explain, and, at least in the context of Cog,
> > easier to implement.
> > 
> > 
> > > If that's right we should be able to simplify the #releaseCriticalSection
> > > method because the suspended process would always be "runnable" and
> > > #terminate wouldn't need the oldList any longer.
> > >
> > 
> > That may indeed be the case.  I haven't tested this far.
> > 
> > But one important question for you and for the list is whether we want the
> > new behaviour to be optional, in exactly the same way as preemptionYields
> > is optional.  i.e. alongside preemptionYields would be something like
> > suspendPassesCriticalSection, and both of these would be false by default
> > in trunk, i.e. preemption does not yield, and suspend does not cause a
> > process to pass through a critical section.  I'm happy to just make it the
> > way it works and not plumb in the vmParameterAt: machinery to control it.
> > However, it may be useful for testing, for documenting the revised
> > behaviour, and, possibly, to allow people to run old images as they always
> > worked (even if this was broken).  So should I add the control or not?
> > 
> > BTW, I have the new behaviour working correctly in the Stack and JIT VMs,
> > so it's ready to go.  I just need an answer to the above before I commit
> > and release.  LMK...
> > 
> 
> My speculative opinion would be to answer nil when suspending a process in the wait to maintain consistency with the actual state of the process; if an existing code tests the list answered by #suspend it shouldn't necessarily fail (unless it does some super crazy stuff) because revised suspend's answer would be consistent with the current state of the process being suspended.
> 
> I haven't found any code in the base image doing that... exept #signalException: which needs to be doublechecked too.
> 
> In case there might be some code not agreeing with this revised behavior a switch might be useful but I can't judge the likelihood or significance.
> 
> One last question: in #terminate, if I want to set suspendedContext to nil I will *have to* suspend the process first (to let primitiveSuspend manipulate the suspendedContext), then save the suspendedContext and only then nil it, right? At the moment I can nil it first and then suspend the process but with the revised suspend I guess this won't work - is this right?
> 
> Thanks again,
> 
> best,
> ~~~
> ^[^    Jaromir
> 
> Sent from Squeak Inbox Talk
> 
> > >
> > > Thanks for your time,
> > >
> > > Happy New Year!
> > >
> > 
> > Happy New Year!!
> > 
> > >
> > >
> > > ~~~
> > > ^[^    Jaromir
> > >
> > > Sent from Squeak Inbox Talk
> > >
> > > On 2021-12-30T11:24:16-08:00, eliot.miranda at gmail.com wrote:
> > >
> > > > Hi Jaromir, Hi Craig, Hi All,
> > > >
> > > > On Tue, Dec 28, 2021 at 3:32 PM Eliot Miranda <eliot.miranda at
> > > gmail.com>
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 28, 2021 at 2:15 PM Eliot Miranda <eliot.miranda at
> > > gmail.com>
> > > > > wrote:
> > > > >
> > > > >>
> > > > >>
> > > > >> On Tue, Dec 28, 2021 at 1:53 PM <mail at jaromir.net> wrote:
> > > > >>
> > > > >>> Hi Eliot, all,
> > > > >>>
> > > > >>> this example shows Mutex's critical section can be entered multiple
> > > > >>> times:
> > > > >>>
> > > > >>
> > > > >> I know.  suspend is broken.  Please read my previous message fully and
> > > > >> carefully.  If I implement the second alternative then the example
> > > works
> > > > >> correctly.  In the siulator I get:
> > > > >>
> > > > >> {a Semaphore(a Process(75019) in [] in [] in UndefinedObject>>DoIt) .
> > > > >>  a Mutex(a Process(59775) in Mutex>>critical:) .
> > > > >>  a Process(75019) in [] in [] in UndefinedObject>>DoIt . false .
> > > > >>  a Process(59775) in Mutex>>critical: . false}
> > > > >>
> > > > >
> > > > > However, this comes at the cost of finding that the new terminate is
> > > > > broken.  If suspend does not remove a process from its list then a
> > > process
> > > > > terminated while waiting on a Semaphore remains on that semaphore.  So
> > > > > terminate must not only ensure that any critical sections are
> > > released, but
> > > > > that the process is removed from its list, if that list is a condition
> > > > > variable.  IMO this should happen very early on in terminate.  I'm
> > > running
> > > > > the second alternative but I have to filter out unrunnable processes in
> > > > > signal et al since suspend no longer removes from the list.
> > > > >
> > > >
> > > > Having thought about it for a couple of days I now think that the second
> > > > alternative is the only rational approach.  This is that suspend always
> > > > removes a process from whatever list it is on, but if the list is not its
> > > > run queue, the process is backed up one bytecode to the send that invoked
> > > > the wait.  Hence if the process resumes it immediately progresses into
> > > the
> > > > wait again, leaving it exactly where it was if it hadn't been suspended.
> > > >
> > > > Craig I hear your concern, but being able to (especially accidentally)
> > > > suspend a process and resume it and find it has progressed beyond
> > > whatever
> > > > condition variable it was waiting on is entirely unacceptable.
> > > >
> > > > My first alternative, that suspend does not remove a process from its
> > > list
> > > > if a condition variable, breaks the existing code base.  For example a
> > > > typical pattern at start up is to suspend the old version of a process
> > > > waiting on a semaphore (e.g. the finalizationProcess) and start up a new
> > > > one.  The first alternative leaves the old process waiting on the
> > > semaphore.
> > > >
> > > > Digression: VisualWorks also implements the second choice, but it isn't
> > > > obvious.  HPS, the VisualWorks VM, can only run jitted code; it has no
> > > > interpreter.  So backing up the pc to before the send is really difficult
> > > > to implement; it introducers arbitrarily many suspension points since the
> > > > send of the wait/enterCriticalSection et al can be preceded by an
> > > arbitrary
> > > > expression.  Instead, one additional suspension point is introduced,
> > > > complementing after a send, at a backward jump, and at method entry
> > > (after
> > > > frame build, before executing the first bytecode).  Here primitives can
> > > be
> > > > in one of two states, uncommitted, and committed.  Uncommitted primitives
> > > > after primitives in progress.  One can't actually see this state.  A
> > > > committed primitive has a frame/context allocated for its execution.  A
> > > > committed primitive may have completed (e.g. an FFI call is in progress,
> > > > waiting for a result) or is yet to start.  So HPS can back up a committed
> > > > primitive wait to the committed but uncompleted state. Hence
> > > > resuming reenters the wait state.  This is ok, but complex.
> > > >
> > > > In Cog we have an interpreter and can easily convert a machine3 code
> > > frame
> > > > in to an interpreted frame, so backing up the process to the send that
> > > > invoked the wait/enterCriticalSection etc is fine.  I'll have a go at
> > > this
> > > > asap.
> > > >
> > > >
> > > > >
> > > > >
> > > > >>>     | s p1 p2 p3 m |
> > > > >>>     s := Semaphore new.
> > > > >>>     m := Mutex new.
> > > > >>>     p1 := [m critical: [s wait]] newProcess.
> > > > >>>     p1 resume.
> > > > >>>     p2 := [m critical: [s wait]] newProcess.
> > > > >>>     p2 resume.
> > > > >>>     Processor yield.
> > > > >>>     { p1. p1 suspend. p2. p2 suspend }.
> > > > >>>     p1 resume. p2 resume.
> > > > >>>     Processor yield.
> > > > >>>     { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
> > > > >>> instVarNamed: 'owner' }.
> > > > >>>     p3 := [m critical: [s wait]] newProcess.
> > > > >>>     p3 resume.
> > > > >>>     Processor yield.
> > > > >>>     { s. m. p1. p1 isTerminated. p2. p2 isBlocked. p3. p3 isBlocked.
> > > m
> > > > >>> isOwned. m instVarNamed: 'owner' }.
> > > > >>>
> > > > >>> I've just added a third process to your last example; p3 really
> > > enters
> > > > >>> the critical section and takes m's ownership despite the fact p2 is
> > > already
> > > > >>> waiting inside m's  critical section - because p2 managed to enter m
> > > > >>> withour taking m's ownership.
> > > > >>>
> > > > >>> Now we could repeat the procedure and keep adding processes inside
> > > the
> > > > >>> critical section indefinitely :) So I guess this really is a bug.
> > > > >>>
> > > > >>> Best,
> > > > >>>
> > > > >>> ~~~
> > > > >>> ^[^    Jaromir
> > > > >>>
> > > > >>> Sent from Squeak Inbox Talk
> > > > >>>
> > > > >>> On 2021-12-28T20:07:25+01:00, mail at jaromir.net wrote:
> > > > >>>
> > > > >>> > Hi Eliot,
> > > > >>> >
> > > > >>> > Thanks! Please see my comments below, it seems to me there may be a
> > > > >>> bug in the Mutex.
> > > > >>> >
> > > > >>> > ~~~
> > > > >>> > ^[^    Jaromir
> > > > >>> >
> > > > >>> > Sent from Squeak Inbox Talk
> > > > >>> >
> > > > >>> > On 2021-12-27T14:55:22-08:00, eliot.miranda at gmail.com wrote:
> > > > >>> >
> > > > >>> > > Hi Jaromir,
> > > > >>> > >
> > > > >>> > > On Mon, Dec 27, 2021 at 2:52 AM <mail at jaromir.net> wrote:
> > > > >>> > >
> > > > >>> > > > Hi all,
> > > > >>> > > >
> > > > >>> > > > What is the desirable semantics of resuming a previously
> > > suspended
> > > > >>> process?
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > That a process continue exactly as it had if it had not been
> > > > >>> suspended in
> > > > >>> > > the first place.  In this regard our suspend is hopelessly
> > > broken for
> > > > >>> > > processes that are waiting on condition variables. See below.
> > > > >>> > >
> > > > >>> > >
> > > > >>> > > >
> > > > >>> > > > #resume's comment says: "Allow the process that the receiver
> > > > >>> represents to
> > > > >>> > > > continue. Put the receiver in *line to become the
> > > activeProcess*."
> > > > >>> > > >
> > > > >>> > > > The side-effect of this is that a terminating process can get
> > > > >>> resumed
> > > > >>> > > > (unless suspendedContext is set to nil - see test
> > > > >>> KernelTests-jar.417 /
> > > > >>> > > > Inbox - which has the unfortunate side-effect of #isTerminated
> > > > >>> answer true
> > > > >>> > > > during termination).
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > But a process that is terminating should not be resumable.  This
> > > > >>> should be
> > > > >>> > > a non-issue.  If a process is terminating itself then it is the
> > > > >>> active
> > > > >>> > > process, it has nil as its suspendedContext, and Processor
> > > > >>> > > activeProcess resume always produces an error.. Any process that
> > > is
> > > > >>> not
> > > > >>> > > terminating itself can be made to fail by having the machinery
> > > set
> > > > >>> the
> > > > >>> > > suspendedContext to nil.
> > > > >>> > >
> > > > >>> >
> > > > >>> > Yes agreed, but unfortunately that's precisely what is not
> > > happening
> > > > >>> in the current and previous #terminate and what I'm proposing in
> > > > >>> Kernel-jar.1437 - to set the suspendedContext to nil during
> > > termination,
> > > > >>> even before calling #releaseCriticalSection.
> > > > >>> >
> > > > >>> > >
> > > > >>> > > > A similar side-effect: a process originally waiting on a
> > > semaphore
> > > > >>> and
> > > > >>> > > > then suspended can be resumed into the runnable state and get
> > > > >>> scheduled,
> > > > >>> > > > effectively escaping the semaphore wait.
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > Right,  This is the bug.  So for example
> > > > >>> > >     | s p |
> > > > >>> > >     s *:=* Semaphore new.
> > > > >>> > >     p *:=* [s wait] newProcess.
> > > > >>> > >     p resume.
> > > > >>> > >     Processor yield.
> > > > >>> > >     { p. p suspend }
> > > > >>> > >
> > > > >>> > > answers an Array of process p that is past the wait, and the
> > > > >>> semaphore, s.
> > > > >>> > > And
> > > > >>> > >
> > > > >>> > >     | s p |
> > > > >>> > >     s *:=* Semaphore new.
> > > > >>> > >     p *:=* [s wait] newProcess.
> > > > >>> > >     p resume.
> > > > >>> > >     Processor yield.
> > > > >>> > >     p suspend; resume.
> > > > >>> > >     Processor yield.
> > > > >>> > >     p isTerminated
> > > > >>> > >
> > > > >>> > > answers true, whereas in both cases the process should remain
> > > > >>> waiting on
> > > > >>> > > the semaphore.
> > > > >>> > >
> > > > >>> > > >
> > > > >>> > > > Is this an expected behavior or a bug?
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > IMO it is a dreadful bug.
> > > > >>> > >
> > > > >>> > > > If a bug, should a suspended process somehow remember its
> > > previous
> > > > >>> state
> > > > >>> > > > and/or queue and return to the same one if resumed?
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > IMO the primitive should back up the process to the
> > > > >>> > > wait/primitiveEnterCriticalSection. This is trivial to implement
> > > in
> > > > >>> the
> > > > >>> > > image, but is potentially non-atomic.  It is perhaps tricky to
> > > > >>> implement in
> > > > >>> > > the VM, but will be atomic.
> > > > >>> > >
> > > > >>> > > Sorry if I'm missing something :)
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > You're not missing anything :-)  Here's another example that
> > > answers
> > > > >>> two
> > > > >>> > > processes which should both block but if resumed both make
> > > progress.
> > > > >>> > >
> > > > >>> > >     | s p1 p2 m |
> > > > >>> > >     s *:=* Semaphore new.
> > > > >>> > >     m *:=* Mutex new.
> > > > >>> > >     p1 *:=* [m critical: [s wait]] newProcess.
> > > > >>> > >     p1 resume.
> > > > >>> > >     p2 *:=* [m critical: [s wait]] newProcess.
> > > > >>> > >     p2 resume.
> > > > >>> > >     Processor yield.
> > > > >>> > >     { p1. p1 suspend. p2. p2 suspend }
> > > > >>> > >
> > > > >>> > > p1 enters the mutex's critical section, becoming the mutex's
> > > owner.
> > > > >>> p2 then
> > > > >>> > > blocks attempting to enter m's critical section.  Let's resume
> > > these
> > > > >>> two,
> > > > >>> > > and examine the semaphore and mutex:
> > > > >>> > >
> > > > >>> > >     | s p1 p2 m |
> > > > >>> > >     s *:=* Semaphore new.
> > > > >>> > >     m *:=* Mutex new.
> > > > >>> > >     p1 *:=* [m critical: [s wait]] newProcess.
> > > > >>> > >     p1 resume.
> > > > >>> > >     p2 *:=* [m critical: [s wait]] newProcess.
> > > > >>> > >     p2 resume.
> > > > >>> > >     Processor yield.
> > > > >>> > >     { p1. p1 suspend. p2. p2 suspend }.
> > > > >>> > >     p1 resume. p2 resume.
> > > > >>> > >     Processor yield.
> > > > >>> > >     { s. m. p1. p1 isTerminated. p2. p2 isTerminated }
> > > > >>> > >
> > > > >>> > > In this case the end result for p2 is accidentally correct. It
> > > ends
> > > > >>> up
> > > > >>> > > waiting on s within m's critical section. But p1 ends up
> > > > >>> terminated.  IMO
> > > > >>> > > the correct result is that p1 remains waiting on s, and is still
> > > the
> > > > >>> owner
> > > > >>> > > of m, and p2 remains blocked trying to take ownership of m.
> > > > >>> > >
> > > > >>> >
> > > > >>> > Perfect example! My naive expectation was when a process inside a
> > > > >>> critical section gets suspended the Mutex gets unlocked but that's
> > > > >>> apparently wrong :)
> > > > >>> >
> > > > >>> > But still, there's something wrong with the example: If p1 resumes
> > > it
> > > > >>> releases m's ownership and terminates, then p2 takes over and
> > > proceeds
> > > > >>> inside the critical section and gets blocked at the semaphore. I'd
> > > expect
> > > > >>> p2 would become the owner of the Mutex m BUT it's not! There's no
> > > owner
> > > > >>> while p2 is sitting at the semaphore. Try:
> > > > >>> >
> > > > >>> >     | s p1 p2 m |
> > > > >>> >     s := Semaphore new.
> > > > >>> >     m := Mutex new.
> > > > >>> >     p1 := [m critical: [s wait]] newProcess.
> > > > >>> >     p1 resume.
> > > > >>> >     p2 := [m critical: [s wait]] newProcess.
> > > > >>> >     p2 resume.
> > > > >>> >     Processor yield.
> > > > >>> >     { p1. p1 suspend. p2. p2 suspend }.
> > > > >>> >     p1 resume. p2 resume.
> > > > >>> >     Processor yield.
> > > > >>> >     { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
> > > > >>> instVarNamed: 'owner' }
> > > > >>> >
> > > > >>> > It seems to me that when p2 gets suspended it is stopped somewhere
> > > > >>> inside #primitiveEnterCriticalSection before the owner is set and
> > > when it
> > > > >>> gets resumed it is placed into the runnable queue with the pc
> > > pointing
> > > > >>> right behind the primitive and so when it runs it just continues
> > > inside
> > > > >>> #critical and get blocked at the semaphore, all without having the
> > > > >>> ownership.
> > > > >>> >
> > > > >>> > Is this interpretation right? It would mean Mutex's critical
> > > section
> > > > >>> can be entered twice via this mechanism...
> > > > >>> >
> > > > >>> > Cuis does set the ownership to p2 in this example.
> > > > >>> >
> > > > >>> > Thanks again,
> > > > >>> >
> > > > >>> > Jaromir
> > > > >>> > >
> > > > >>> > > >
> > > > >>> > > > Best,
> > > > >>> > > > ~~~
> > > > >>> > > > ^[^    Jaromir
> > > > >>> > > >
> > > > >>> > > > Sent from Squeak Inbox Talk
> > > > >>> > > >
> > > > >>> > >
> > > > >>> > > _,,,^..^,,,_
> > > > >>> > > best, Eliot
> > > > >>> > > -------------- next part --------------
> > > > >>> > > An HTML attachment was scrubbed...
> > > > >>> > > URL: <
> > > > >>>
> > > http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211227/8719df13/attachment.html
> > > > >>> >
> > > > >>> > >
> > > > >>> > >
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> _,,,^..^,,,_
> > > > >> best, Eliot
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > _,,,^..^,,,_
> > > > > best, Eliot
> > > > >
> > > >
> > > >
> > > > --
> > > > _,,,^..^,,,_
> > > > best, Eliot
> > > > -------------- next part --------------
> > > > An HTML attachment was scrubbed...
> > > > URL: <
> > > http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211230/060f87d7/attachment.html
> > > >
> > > >
> > > >
> > >
> > 
> > 
> > -- 
> > _,,,^..^,,,_
> > best, Eliot
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211231/53027b97/attachment.html>
> > 
> >
> 
> 
["DelayWaitTimeout-signalWaitingProcess.st"]
["ProcessTest-testAtomicSuspend.st"]
["ProcessTest-testRevisedSuspendExpectations.st"]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DelayWaitTimeout-signalWaitingProcess.st
Type: application/octet-stream
Size: 625 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220105/c9ec53bc/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ProcessTest-testAtomicSuspend.st
Type: application/octet-stream
Size: 435 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220105/c9ec53bc/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ProcessTest-testRevisedSuspendExpectations.st
Type: application/octet-stream
Size: 747 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220105/c9ec53bc/attachment-0002.obj>


More information about the Squeak-dev mailing list