[squeak-dev] Process #suspend / #resume semantics

Eliot Miranda eliot.miranda at gmail.com
Tue Dec 28 23:32:13 UTC 2021


On Tue, Dec 28, 2021 at 2:15 PM Eliot Miranda <eliot.miranda at gmail.com>
wrote:

>
>
> On Tue, Dec 28, 2021 at 1:53 PM <mail at jaromir.net> wrote:
>
>> Hi Eliot, all,
>>
>> this example shows Mutex's critical section can be entered multiple times:
>>
>
> I know.  suspend is broken.  Please read my previous message fully and
> carefully.  If I implement the second alternative then the example works
> correctly.  In the siulator I get:
>
> {a Semaphore(a Process(75019) in [] in [] in UndefinedObject>>DoIt) .
>  a Mutex(a Process(59775) in Mutex>>critical:) .
>  a Process(75019) in [] in [] in UndefinedObject>>DoIt . false .
>  a Process(59775) in Mutex>>critical: . false}
>

However, this comes at the cost of finding that the new terminate is
broken.  If suspend does not remove a process from its list then a process
terminated while waiting on a Semaphore remains on that semaphore.  So
terminate must not only ensure that any critical sections are released, but
that the process is removed from its list, if that list is a condition
variable.  IMO this should happen very early on in terminate.  I'm running
the second alternative but I have to filter out unrunnable processes in
signal et al since suspend no longer removes from the list.



>>     | s p1 p2 p3 m |
>>     s := Semaphore new.
>>     m := Mutex new.
>>     p1 := [m critical: [s wait]] newProcess.
>>     p1 resume.
>>     p2 := [m critical: [s wait]] newProcess.
>>     p2 resume.
>>     Processor yield.
>>     { p1. p1 suspend. p2. p2 suspend }.
>>     p1 resume. p2 resume.
>>     Processor yield.
>>     { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
>> instVarNamed: 'owner' }.
>>     p3 := [m critical: [s wait]] newProcess.
>>     p3 resume.
>>     Processor yield.
>>     { s. m. p1. p1 isTerminated. p2. p2 isBlocked. p3. p3 isBlocked. m
>> isOwned. m instVarNamed: 'owner' }.
>>
>> I've just added a third process to your last example; p3 really enters
>> the critical section and takes m's ownership despite the fact p2 is already
>> waiting inside m's  critical section - because p2 managed to enter m
>> withour taking m's ownership.
>>
>> Now we could repeat the procedure and keep adding processes inside the
>> critical section indefinitely :) So I guess this really is a bug.
>>
>> Best,
>>
>> ~~~
>> ^[^    Jaromir
>>
>> Sent from Squeak Inbox Talk
>>
>> On 2021-12-28T20:07:25+01:00, mail at jaromir.net wrote:
>>
>> > Hi Eliot,
>> >
>> > Thanks! Please see my comments below, it seems to me there may be a bug
>> in the Mutex.
>> >
>> > ~~~
>> > ^[^    Jaromir
>> >
>> > Sent from Squeak Inbox Talk
>> >
>> > On 2021-12-27T14:55:22-08:00, eliot.miranda at gmail.com wrote:
>> >
>> > > Hi Jaromir,
>> > >
>> > > On Mon, Dec 27, 2021 at 2:52 AM <mail at jaromir.net> wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > What is the desirable semantics of resuming a previously suspended
>> process?
>> > > >
>> > >
>> > > That a process continue exactly as it had if it had not been
>> suspended in
>> > > the first place.  In this regard our suspend is hopelessly broken for
>> > > processes that are waiting on condition variables. See below.
>> > >
>> > >
>> > > >
>> > > > #resume's comment says: "Allow the process that the receiver
>> represents to
>> > > > continue. Put the receiver in *line to become the activeProcess*."
>> > > >
>> > > > The side-effect of this is that a terminating process can get
>> resumed
>> > > > (unless suspendedContext is set to nil - see test
>> KernelTests-jar.417 /
>> > > > Inbox - which has the unfortunate side-effect of #isTerminated
>> answer true
>> > > > during termination).
>> > > >
>> > >
>> > > But a process that is terminating should not be resumable.  This
>> should be
>> > > a non-issue.  If a process is terminating itself then it is the active
>> > > process, it has nil as its suspendedContext, and Processor
>> > > activeProcess resume always produces an error.. Any process that is
>> not
>> > > terminating itself can be made to fail by having the machinery set the
>> > > suspendedContext to nil.
>> > >
>> >
>> > Yes agreed, but unfortunately that's precisely what is not happening in
>> the current and previous #terminate and what I'm proposing in
>> Kernel-jar.1437 - to set the suspendedContext to nil during termination,
>> even before calling #releaseCriticalSection.
>> >
>> > >
>> > > > A similar side-effect: a process originally waiting on a semaphore
>> and
>> > > > then suspended can be resumed into the runnable state and get
>> scheduled,
>> > > > effectively escaping the semaphore wait.
>> > > >
>> > >
>> > > Right,  This is the bug.  So for example
>> > >     | s p |
>> > >     s *:=* Semaphore new.
>> > >     p *:=* [s wait] newProcess.
>> > >     p resume.
>> > >     Processor yield.
>> > >     { p. p suspend }
>> > >
>> > > answers an Array of process p that is past the wait, and the
>> semaphore, s.
>> > > And
>> > >
>> > >     | s p |
>> > >     s *:=* Semaphore new.
>> > >     p *:=* [s wait] newProcess.
>> > >     p resume.
>> > >     Processor yield.
>> > >     p suspend; resume.
>> > >     Processor yield.
>> > >     p isTerminated
>> > >
>> > > answers true, whereas in both cases the process should remain waiting
>> on
>> > > the semaphore.
>> > >
>> > > >
>> > > > Is this an expected behavior or a bug?
>> > > >
>> > >
>> > > IMO it is a dreadful bug.
>> > >
>> > > > If a bug, should a suspended process somehow remember its previous
>> state
>> > > > and/or queue and return to the same one if resumed?
>> > > >
>> > >
>> > > IMO the primitive should back up the process to the
>> > > wait/primitiveEnterCriticalSection. This is trivial to implement in
>> the
>> > > image, but is potentially non-atomic.  It is perhaps tricky to
>> implement in
>> > > the VM, but will be atomic.
>> > >
>> > > Sorry if I'm missing something :)
>> > > >
>> > >
>> > > You're not missing anything :-)  Here's another example that answers
>> two
>> > > processes which should both block but if resumed both make progress.
>> > >
>> > >     | s p1 p2 m |
>> > >     s *:=* Semaphore new.
>> > >     m *:=* Mutex new.
>> > >     p1 *:=* [m critical: [s wait]] newProcess.
>> > >     p1 resume.
>> > >     p2 *:=* [m critical: [s wait]] newProcess.
>> > >     p2 resume.
>> > >     Processor yield.
>> > >     { p1. p1 suspend. p2. p2 suspend }
>> > >
>> > > p1 enters the mutex's critical section, becoming the mutex's owner.
>> p2 then
>> > > blocks attempting to enter m's critical section.  Let's resume these
>> two,
>> > > and examine the semaphore and mutex:
>> > >
>> > >     | s p1 p2 m |
>> > >     s *:=* Semaphore new.
>> > >     m *:=* Mutex new.
>> > >     p1 *:=* [m critical: [s wait]] newProcess.
>> > >     p1 resume.
>> > >     p2 *:=* [m critical: [s wait]] newProcess.
>> > >     p2 resume.
>> > >     Processor yield.
>> > >     { p1. p1 suspend. p2. p2 suspend }.
>> > >     p1 resume. p2 resume.
>> > >     Processor yield.
>> > >     { s. m. p1. p1 isTerminated. p2. p2 isTerminated }
>> > >
>> > > In this case the end result for p2 is accidentally correct. It ends up
>> > > waiting on s within m's critical section. But p1 ends up terminated.
>> IMO
>> > > the correct result is that p1 remains waiting on s, and is still the
>> owner
>> > > of m, and p2 remains blocked trying to take ownership of m.
>> > >
>> >
>> > Perfect example! My naive expectation was when a process inside a
>> critical section gets suspended the Mutex gets unlocked but that's
>> apparently wrong :)
>> >
>> > But still, there's something wrong with the example: If p1 resumes it
>> releases m's ownership and terminates, then p2 takes over and proceeds
>> inside the critical section and gets blocked at the semaphore. I'd expect
>> p2 would become the owner of the Mutex m BUT it's not! There's no owner
>> while p2 is sitting at the semaphore. Try:
>> >
>> >     | s p1 p2 m |
>> >     s := Semaphore new.
>> >     m := Mutex new.
>> >     p1 := [m critical: [s wait]] newProcess.
>> >     p1 resume.
>> >     p2 := [m critical: [s wait]] newProcess.
>> >     p2 resume.
>> >     Processor yield.
>> >     { p1. p1 suspend. p2. p2 suspend }.
>> >     p1 resume. p2 resume.
>> >     Processor yield.
>> >     { s. m. p1. p1 isTerminated. p2. p2 isTerminated. m isOwned. m
>> instVarNamed: 'owner' }
>> >
>> > It seems to me that when p2 gets suspended it is stopped somewhere
>> inside #primitiveEnterCriticalSection before the owner is set and when it
>> gets resumed it is placed into the runnable queue with the pc pointing
>> right behind the primitive and so when it runs it just continues inside
>> #critical and get blocked at the semaphore, all without having the
>> ownership.
>> >
>> > Is this interpretation right? It would mean Mutex's critical section
>> can be entered twice via this mechanism...
>> >
>> > Cuis does set the ownership to p2 in this example.
>> >
>> > Thanks again,
>> >
>> > Jaromir
>> > >
>> > > >
>> > > > Best,
>> > > > ~~~
>> > > > ^[^    Jaromir
>> > > >
>> > > > Sent from Squeak Inbox Talk
>> > > >
>> > >
>> > > _,,,^..^,,,_
>> > > best, Eliot
>> > > -------------- next part --------------
>> > > An HTML attachment was scrubbed...
>> > > URL: <
>> http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211227/8719df13/attachment.html
>> >
>> > >
>> > >
>> >
>> >
>>
>
>
> --
> _,,,^..^,,,_
> best, Eliot
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20211228/2f1a66ff/attachment.html>


More information about the Squeak-dev mailing list