[squeak-dev] Process #suspend / #resume semantics
mail at jaromir.net
mail at jaromir.net
Fri Dec 31 19:09:36 UTC 2021
Hi Eliot,
do I understand well that the new suspend should never answer the conditional variable's list (Semaphore/Mutex) but only nil (for active or previously suspended processes) or a run queue (for blocked or runnable but not running processes)??
If primitiveSuspend backs up a blocked process before the last send it's actually equivalent to backing up one instruction and then suspending, i.e. suspending as if while still in the run queue - is this thought experiment correct or am I totally confused?
Or are there suspension points that may stop the suspend in the middle etc. - this is completely out of my league... sorry, just an idea.
If that's right we should be able to simplify the #releaseCriticalSection method because the suspended process would always be "runnable" and #terminate wouldn't need the oldList any longer.
Thanks for your time,
Happy New Year!
~~~
^[^ Jaromir
Sent from Squeak Inbox Talk
On 2021-12-30T11:24:16-08:00, eliot.miranda at gmail.com wrote:
> Hi Jaromir, Hi Craig, Hi All,
>
> On Tue, Dec 28, 2021 at 3:32 PM Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>
> >
> >
> > On Tue, Dec 28, 2021 at 2:15 PM Eliot Miranda <eliot.miranda at gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Tue, Dec 28, 2021 at 1:53 PM <mail at jaromir.net> wrote:
> >>
> >>> Hi Eliot, all,
> >>>
> >>> this example shows Mutex's critical section can be entered multiple
> >>> times:
> >>>
> >>
> >> I know. suspend is broken. Please read my previous message fully and
> >> carefully. If I implement the second alternative then the example works
> >> correctly. In the siulator I get:
> >>
> >> {a Semaphore(a Process(75019) in [] in [] in UndefinedObject>>DoIt) .
> >> a Mutex(a Process(59775) in Mutex>>critical:) .
> >> a Process(75019) in [] in [] in UndefinedObject>>DoIt . false .
> >> a Process(59775) in Mutex>>critical: . false}
> >>
> >
> > However, this comes at the cost of finding that the new terminate is
> > broken. If suspend does not remove a process from its list then a process
> > terminated while waiting on a Semaphore remains on that semaphore. So
> > terminate must not only ensure that any critical sections are released, but
> > that the process is removed from its list, if that list is a condition
> > variable. IMO this should happen very early on in terminate. I'm running
> > the second alternative but I have to filter out unrunnable processes in
> > signal et al since suspend no longer removes from the list.
> >
>
> Having thought about it for a couple of days I now think that the second
> alternative is the only rational approach. This is that suspend always
> removes a process from whatever list it is on, but if the list is not its
> run queue, the process is backed up one bytecode to the send that invoked
> the wait. Hence if the process resumes it immediately progresses into the
> wait again, leaving it exactly where it was if it hadn't been suspended.
>
> Craig I hear your concern, but being able to (especially accidentally)
> suspend a process and resume it and find it has progressed beyond whatever
> condition variable it was waiting on is entirely unacceptable.
>
> My first alternative, that suspend does not remove a process from its list
> if a condition variable, breaks the existing code base. For example a
> typical pattern at start up is to suspend the old version of a process
> waiting on a semaphore (e.g. the finalizationProcess) and start up a new
> one. The first alternative leaves the old process waiting on the semaphore.
>
> Digression: VisualWorks also implements the second choice, but it isn't
> obvious. HPS, the VisualWorks VM, can only run jitted code; it has no
> interpreter. So backing up the pc to before the send is really difficult
> to implement; it introducers arbitrarily many suspension points since the
> send of the wait/enterCriticalSection et al can be preceded by an arbitrary
> expression. Instead, one additional suspension point is introduced,
> complementing after a send, at a backward jump, and at method entry (after
> frame build, before executing the first bytecode). Here primitives can be
> in one of two states, uncommitted, and committed. Uncommitted primitives
> after primitives in progress. One can't actually see this state. A
> committed primitive has a frame/context allocated for its execution. A
> committed primitive may have completed (e.g. an FFI call is in progress,
> waiting for a result) or is yet to start. So HPS can back up a committed
> primitive wait to the committed but uncompleted state. Hence
> resuming reenters the wait state. This is ok, but complex.
>
> In Cog we have an interpreter and can easily convert a machine3 code frame
> in to an interpreted frame, so backing up the process to the send that
> invoked the wait/enterCriticalSection etc is fine. I'll have a go at this
> asap.
>
>
> >
> >
> >>> | s p1 p2 p3 m |
> >>> s := Semaphore new.
> >>> m := Mutex new.
> >>> p1 := [m critical: [s wait]] newProcess.
> >>> p1 resume.
> >>> p2 := [m critical: [s wait]] newProcess.
> >>> p2 resume.
> >>> Processor yield.
> >>> { p1. p1 suspend. p2. p2 suspend }.
> >>> p1 resume. p2 resume.
> >>> Processor yield.
> >>> { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
> >>> instVarNamed: 'owner' }.
> >>> p3 := [m critical: [s wait]] newProcess.
> >>> p3 resume.
> >>> Processor yield.
> >>> { s. m. p1. p1 isTerminated. p2. p2 isBlocked. p3. p3 isBlocked. m
> >>> isOwned. m instVarNamed: 'owner' }.
> >>>
> >>> I've just added a third process to your last example; p3 really enters
> >>> the critical section and takes m's ownership despite the fact p2 is already
> >>> waiting inside m's critical section - because p2 managed to enter m
> >>> withour taking m's ownership.
> >>>
> >>> Now we could repeat the procedure and keep adding processes inside the
> >>> critical section indefinitely :) So I guess this really is a bug.
> >>>
> >>> Best,
> >>>
> >>> ~~~
> >>> ^[^ Jaromir
> >>>
> >>> Sent from Squeak Inbox Talk
> >>>
> >>> On 2021-12-28T20:07:25+01:00, mail at jaromir.net wrote:
> >>>
> >>> > Hi Eliot,
> >>> >
> >>> > Thanks! Please see my comments below, it seems to me there may be a
> >>> bug in the Mutex.
> >>> >
> >>> > ~~~
> >>> > ^[^ Jaromir
> >>> >
> >>> > Sent from Squeak Inbox Talk
> >>> >
> >>> > On 2021-12-27T14:55:22-08:00, eliot.miranda at gmail.com wrote:
> >>> >
> >>> > > Hi Jaromir,
> >>> > >
> >>> > > On Mon, Dec 27, 2021 at 2:52 AM <mail at jaromir.net> wrote:
> >>> > >
> >>> > > > Hi all,
> >>> > > >
> >>> > > > What is the desirable semantics of resuming a previously suspended
> >>> process?
> >>> > > >
> >>> > >
> >>> > > That a process continue exactly as it had if it had not been
> >>> suspended in
> >>> > > the first place. In this regard our suspend is hopelessly broken for
> >>> > > processes that are waiting on condition variables. See below.
> >>> > >
> >>> > >
> >>> > > >
> >>> > > > #resume's comment says: "Allow the process that the receiver
> >>> represents to
> >>> > > > continue. Put the receiver in *line to become the activeProcess*."
> >>> > > >
> >>> > > > The side-effect of this is that a terminating process can get
> >>> resumed
> >>> > > > (unless suspendedContext is set to nil - see test
> >>> KernelTests-jar.417 /
> >>> > > > Inbox - which has the unfortunate side-effect of #isTerminated
> >>> answer true
> >>> > > > during termination).
> >>> > > >
> >>> > >
> >>> > > But a process that is terminating should not be resumable. This
> >>> should be
> >>> > > a non-issue. If a process is terminating itself then it is the
> >>> active
> >>> > > process, it has nil as its suspendedContext, and Processor
> >>> > > activeProcess resume always produces an error.. Any process that is
> >>> not
> >>> > > terminating itself can be made to fail by having the machinery set
> >>> the
> >>> > > suspendedContext to nil.
> >>> > >
> >>> >
> >>> > Yes agreed, but unfortunately that's precisely what is not happening
> >>> in the current and previous #terminate and what I'm proposing in
> >>> Kernel-jar.1437 - to set the suspendedContext to nil during termination,
> >>> even before calling #releaseCriticalSection.
> >>> >
> >>> > >
> >>> > > > A similar side-effect: a process originally waiting on a semaphore
> >>> and
> >>> > > > then suspended can be resumed into the runnable state and get
> >>> scheduled,
> >>> > > > effectively escaping the semaphore wait.
> >>> > > >
> >>> > >
> >>> > > Right, This is the bug. So for example
> >>> > > | s p |
> >>> > > s *:=* Semaphore new.
> >>> > > p *:=* [s wait] newProcess.
> >>> > > p resume.
> >>> > > Processor yield.
> >>> > > { p. p suspend }
> >>> > >
> >>> > > answers an Array of process p that is past the wait, and the
> >>> semaphore, s.
> >>> > > And
> >>> > >
> >>> > > | s p |
> >>> > > s *:=* Semaphore new.
> >>> > > p *:=* [s wait] newProcess.
> >>> > > p resume.
> >>> > > Processor yield.
> >>> > > p suspend; resume.
> >>> > > Processor yield.
> >>> > > p isTerminated
> >>> > >
> >>> > > answers true, whereas in both cases the process should remain
> >>> waiting on
> >>> > > the semaphore.
> >>> > >
> >>> > > >
> >>> > > > Is this an expected behavior or a bug?
> >>> > > >
> >>> > >
> >>> > > IMO it is a dreadful bug.
> >>> > >
> >>> > > > If a bug, should a suspended process somehow remember its previous
> >>> state
> >>> > > > and/or queue and return to the same one if resumed?
> >>> > > >
> >>> > >
> >>> > > IMO the primitive should back up the process to the
> >>> > > wait/primitiveEnterCriticalSection. This is trivial to implement in
> >>> the
> >>> > > image, but is potentially non-atomic. It is perhaps tricky to
> >>> implement in
> >>> > > the VM, but will be atomic.
> >>> > >
> >>> > > Sorry if I'm missing something :)
> >>> > > >
> >>> > >
> >>> > > You're not missing anything :-) Here's another example that answers
> >>> two
> >>> > > processes which should both block but if resumed both make progress.
> >>> > >
> >>> > > | s p1 p2 m |
> >>> > > s *:=* Semaphore new.
> >>> > > m *:=* Mutex new.
> >>> > > p1 *:=* [m critical: [s wait]] newProcess.
> >>> > > p1 resume.
> >>> > > p2 *:=* [m critical: [s wait]] newProcess.
> >>> > > p2 resume.
> >>> > > Processor yield.
> >>> > > { p1. p1 suspend. p2. p2 suspend }
> >>> > >
> >>> > > p1 enters the mutex's critical section, becoming the mutex's owner.
> >>> p2 then
> >>> > > blocks attempting to enter m's critical section. Let's resume these
> >>> two,
> >>> > > and examine the semaphore and mutex:
> >>> > >
> >>> > > | s p1 p2 m |
> >>> > > s *:=* Semaphore new.
> >>> > > m *:=* Mutex new.
> >>> > > p1 *:=* [m critical: [s wait]] newProcess.
> >>> > > p1 resume.
> >>> > > p2 *:=* [m critical: [s wait]] newProcess.
> >>> > > p2 resume.
> >>> > > Processor yield.
> >>> > > { p1. p1 suspend. p2. p2 suspend }.
> >>> > > p1 resume. p2 resume.
> >>> > > Processor yield.
> >>> > > { s. m. p1. p1 isTerminated. p2. p2 isTerminated }
> >>> > >
> >>> > > In this case the end result for p2 is accidentally correct. It ends
> >>> up
> >>> > > waiting on s within m's critical section. But p1 ends up
> >>> terminated. IMO
> >>> > > the correct result is that p1 remains waiting on s, and is still the
> >>> owner
> >>> > > of m, and p2 remains blocked trying to take ownership of m.
> >>> > >
> >>> >
> >>> > Perfect example! My naive expectation was when a process inside a
> >>> critical section gets suspended the Mutex gets unlocked but that's
> >>> apparently wrong :)
> >>> >
> >>> > But still, there's something wrong with the example: If p1 resumes it
> >>> releases m's ownership and terminates, then p2 takes over and proceeds
> >>> inside the critical section and gets blocked at the semaphore. I'd expect
> >>> p2 would become the owner of the Mutex m BUT it's not! There's no owner
> >>> while p2 is sitting at the semaphore. Try:
> >>> >
> >>> > | s p1 p2 m |
> >>> > s := Semaphore new.
> >>> > m := Mutex new.
> >>> > p1 := [m critical: [s wait]] newProcess.
> >>> > p1 resume.
> >>> > p2 := [m critical: [s wait]] newProcess.
> >>> > p2 resume.
> >>> > Processor yield.
> >>> > { p1. p1 suspend. p2. p2 suspend }.
> >>> > p1 resume. p2 resume.
> >>> > Processor yield.
> >>> > { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
> >>> instVarNamed: 'owner' }
> >>> >
> >>> > It seems to me that when p2 gets suspended it is stopped somewhere
> >>> inside #primitiveEnterCriticalSection before the owner is set and when it
> >>> gets resumed it is placed into the runnable queue with the pc pointing
> >>> right behind the primitive and so when it runs it just continues inside
> >>> #critical and get blocked at the semaphore, all without having the
> >>> ownership.
> >>> >
> >>> > Is this interpretation right? It would mean Mutex's critical section
> >>> can be entered twice via this mechanism...
> >>> >
> >>> > Cuis does set the ownership to p2 in this example.
> >>> >
> >>> > Thanks again,
> >>> >
> >>> > Jaromir
> >>> > >
> >>> > > >
> >>> > > > Best,
> >>> > > > ~~~
> >>> > > > ^[^ Jaromir
> >>> > > >
> >>> > > > Sent from Squeak Inbox Talk
> >>> > > >
> >>> > >
> >>> > > _,,,^..^,,,_
> >>> > > best, Eliot
> >>> > > -------------- next part --------------
> >>> > > An HTML attachment was scrubbed...
> >>> > > URL: <
> >>> http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211227/8719df13/attachment.html
> >>> >
> >>> > >
> >>> > >
> >>> >
> >>> >
> >>>
> >>
> >>
> >> --
> >> _,,,^..^,,,_
> >> best, Eliot
> >>
> >
> >
> > --
> > _,,,^..^,,,_
> > best, Eliot
> >
>
>
> --
> _,,,^..^,,,_
> best, Eliot
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211230/060f87d7/attachment.html>
>
>
More information about the Squeak-dev
mailing list
|