[squeak-dev] Re: Suspending process fix

Igor Stasenko siguctua at gmail.com
Tue Apr 28 04:53:01 UTC 2009


2009/4/28 Andreas Raab <andreas.raab at gmx.de>:
> Hi Igor -
>
> Having spent more time than I dare admitting in debugging multi-process
> interactions on heavily utilized servers I can tell you that your fix isn't
> going to work very well ;-) If only because it relies on #suspend being
> non-atomic which was one of the largest sources of problems when used with
> semaphores. Atomic suspension is absolutely critical since the VM may or may
> not decide to switch processes when you are in the midst of modifying that
> list and when this happens you're in for a very bad time (processes will
> vanish in thin air and garbage collected). Making Process>>suspend be atomic
> in all circumstances was one of the major advances for stability on our
> servers and I'm kinda surprised these changes are not in the latest VMs (I
> will verify this later).
>
+100. I didn't mentioned that these changes should find their way to
VM, just because i sure that everyone who is interested in this topic,
understands it by default :)

> One thing I'm curious about is what is the use case are you looking at? I
> have never come across this particular problem in practice since explicit
> suspend and resume operations are exceptionally rare outside of the
> debugger.
>

Well, i discovered this issue when wanted to make a current process be
the only process which can be running during a certain period of time
- without any chance for being interrupted by another, higher priority
process.
And how naive i am, thinking that doing simply suspend/resume will do
what i want to. :)

IMO, the suspend/resume should guarantee that suspended process will
not perform any actions under any circumstances and be able to
continue normally after issuing corresponding #resume.
As my issue illustrates, this is not true for processes which is
waiting on semaphore.

Ask yourself, why a developer, who may want to suspend any process
(regardless of his intents) to resume it later, should make any
assertions like "what will be broken if i suspend it?".

> That said, I am not certain the behavior is "broken" to gegin with, in
> particular consider the alternatives. Depending on what problem you're
> trying to solve, you might want to work around the issue by utilizing the
> return value from #suspend. For example, doing something like:
>
>  list := process suspend.
>  list resumeProcess: process.
>
> and then implement
>
> Semaphore>>resumeProcess: aProcess
>  "The process was waiting on this semaphore. Put it back on."
>

Of course, there is alternatives.
" sarcasm starts here "
For instance an integer multiplication for positive a*b can be implemented as:

result := 0.
 1 to: a do: [:i | result := result + b ].
^ result

and then we could tell developers how to implement a 'rare' cases when
they need to multiply non-positive integers, or floating point values.
and so on. Depending on their problem, we could always propose a
solution :)
" end of sarcasm "


> Of course, this also illustrates one of the problems with doing this in
> general which is: Are you going to put the process at the beginning of the
> list? In the middle? At the end? Since any decision probably depends on
> application specific context, one could argue that the best way to deal with
> it then is along the lines of, say:
>
> list := process suspend.
> list
>  ifNil:[process resume "process was active or suspended already"]
>  ifNotNil:[list addFirst: process] "put process back on list"
>
> But of course the latter then raises the issue that if this is an already
> signaled semaphore this shouldn't be doing that either. Icky stuff.
>
> Which is why I really do prefer the current behavior which gives you a
> consistent behavior across the board. #suspend takes a process of a
> semaphore, #resume simply makes the process runnable again. It may not be
> what you expected it to be but it is consistent and sometimes this is all
> you can ask for. If you need specifically different behavior you can
> implement that in a way described above but it isn't clear to me whether any
> of these would be improvements because when it comes to the edge cases they
> are just as problematic as the current solution.
>
Right. My proposal doesn't deals correctly with cases when there are
multiple processes waiting on a single semaphore.
It is correct only for a single process.

If we suppose, that we are running in ideal environment, where
processes are running in parallel, then nothing prevents us to
implement waiting as following:

Semaphore>>wait
| flag |
  flag := Array with: false with: nil.  "a second slot will be used
for the 'next' pointer, for chaining the list of consumers who waiting
on signal"

  self scheduleFlagToSignal: flag.

  [ flag at: 1 ] whileFalse: []

it is clear, that to exit from loop, there should be external process
who alters the flag from false to true.
Also, it is clear, that suspending the loop in the middle can't affect
other processes which possibly waiting for same semaphore signal, as
well as it guarantees that semaphore signal will be consumed in
correct order.

This example illustrates the fundamental difference between waiting
and suspending/preemption, while in Squeak, unfortunately there is an
equal sign between them (semaphores use a VM's preemption mechanism to
implement waiting).
If we take this example as a base, then of course we can implement a
correct scenario which deals with preemption and scheduling.


> Cheers,
>  - Andreas
>
>
> Igor Stasenko wrote:
>>
>> Hello, i trying to provide a fix to the issue
>> http://bugs.squeak.org/view.php?id=6822
>> what is needed to make following code work w/o error:
>>
>> |sema proc |
>> sema := Semaphore new.
>> proc := [ sema wait. self error: 'should not be there' ] fork.
>> Processor yield.
>> proc suspend.
>> proc resume.
>>
>> The problem here, is that once you issue a #suspend on a process it
>> puts nil to its myList ivar, ignoring the fact that it could be
>> waiting on semaphore.
>> This leads to disconnecting the process from semaphore, and any
>> subsequent 'sema signal' will be simply lost:
>>
>> suspend
>>        | oldList |
>>        <primitive: 88>   "primitive will fail, if receiver is not active
>> process, which is just our case"
>>        myList ifNil:[^nil].
>>        oldList := myList.
>>        myList := nil.
>>        oldList remove: self ifAbsent:[].
>>        ^oldList
>>
>> the possible fix would be:
>> - add an instance variable to Process, say 'semaphore'.
>>
>> - fix the #suspend to following:
>>
>> suspend
>>        | oldList |
>>        <primitive: 88>   "primitive will fail, if receiver is not active
>> process, which is just our case"
>>        myList ifNil:[^nil].
>>        oldList := myList.
>>        myList := nil.
>>        oldList remove: self ifAbsent:[].
>> oldList isKindOf: Semaphore ifTrue: [ semaphore := oldList ].
>>        ^oldList
>>
>>
>> - then, in the #resume
>>
>> resume
>>        suspendedContext ifNil: [^ self primitiveFailed].
>>        semaphore ifNotNil: [
>>            semaphore resumeWaiting: self.
>>            semaphore := nil.
>>            ^ self.
>>        ]
>>        ^ self primitiveResume
>>
>> and
>>
>> Semaphore>>resumeWaiting: aProcess
>>        excessSignals>0
>>                ifTrue: [excessSignals := excessSignals-1.
>>                    aProcess primitiveResume. "resume immediately" ]
>>                ifFalse: [self addLastLink: aProcess]
>>
>> ----------
>> An attached file is the changeset which makes following test to work
>> w/o failure:
>>
>> |sema proc bool |
>> bool := true.
>> sema := Semaphore new.
>> proc := [ sema wait. bool ifTrue: [self error: 'should not be there'].
>> Transcript show: 'semaphore signaled' ] fork.
>> Processor yield.
>> proc newSuspend.
>> proc newResume.
>> self assert: (sema last == proc).
>> bool := false.
>> sema signal.
>>
>>
>> P.S. i didn't checked how this fix will affect other things around
>> processes, like mentioned by Mike:
>>
>> Note that Process>>signalException: (as of in Squeak 3.8, it may be
>> fixed in later versions?) relies on this broken behaviour.
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>
>
>



-- 
Best regards,
Igor Stasenko AKA sig.



More information about the Squeak-dev mailing list