[squeak-dev] Re: Suspending process fix
Andreas Raab
andreas.raab at gmx.de
Tue Apr 28 03:25:09 UTC 2009
Hi Igor -
Having spent more time than I dare admitting in debugging multi-process
interactions on heavily utilized servers I can tell you that your fix
isn't going to work very well ;-) If only because it relies on #suspend
being non-atomic which was one of the largest sources of problems when
used with semaphores. Atomic suspension is absolutely critical since the
VM may or may not decide to switch processes when you are in the midst
of modifying that list and when this happens you're in for a very bad
time (processes will vanish in thin air and garbage collected). Making
Process>>suspend be atomic in all circumstances was one of the major
advances for stability on our servers and I'm kinda surprised these
changes are not in the latest VMs (I will verify this later).
One thing I'm curious about is what is the use case are you looking at?
I have never come across this particular problem in practice since
explicit suspend and resume operations are exceptionally rare outside of
the debugger.
That said, I am not certain the behavior is "broken" to gegin with, in
particular consider the alternatives. Depending on what problem you're
trying to solve, you might want to work around the issue by utilizing
the return value from #suspend. For example, doing something like:
list := process suspend.
list resumeProcess: process.
and then implement
Semaphore>>resumeProcess: aProcess
"The process was waiting on this semaphore. Put it back on."
Of course, this also illustrates one of the problems with doing this in
general which is: Are you going to put the process at the beginning of
the list? In the middle? At the end? Since any decision probably depends
on application specific context, one could argue that the best way to
deal with it then is along the lines of, say:
list := process suspend.
list
ifNil:[process resume "process was active or suspended already"]
ifNotNil:[list addFirst: process] "put process back on list"
But of course the latter then raises the issue that if this is an
already signaled semaphore this shouldn't be doing that either. Icky stuff.
Which is why I really do prefer the current behavior which gives you a
consistent behavior across the board. #suspend takes a process of a
semaphore, #resume simply makes the process runnable again. It may not
be what you expected it to be but it is consistent and sometimes this is
all you can ask for. If you need specifically different behavior you can
implement that in a way described above but it isn't clear to me whether
any of these would be improvements because when it comes to the edge
cases they are just as problematic as the current solution.
Cheers,
- Andreas
Igor Stasenko wrote:
> Hello, i trying to provide a fix to the issue
> http://bugs.squeak.org/view.php?id=6822
> what is needed to make following code work w/o error:
>
> |sema proc |
> sema := Semaphore new.
> proc := [ sema wait. self error: 'should not be there' ] fork.
> Processor yield.
> proc suspend.
> proc resume.
>
> The problem here, is that once you issue a #suspend on a process it
> puts nil to its myList ivar, ignoring the fact that it could be
> waiting on semaphore.
> This leads to disconnecting the process from semaphore, and any
> subsequent 'sema signal' will be simply lost:
>
> suspend
> | oldList |
> <primitive: 88> "primitive will fail, if receiver is not active
> process, which is just our case"
> myList ifNil:[^nil].
> oldList := myList.
> myList := nil.
> oldList remove: self ifAbsent:[].
> ^oldList
>
> the possible fix would be:
> - add an instance variable to Process, say 'semaphore'.
>
> - fix the #suspend to following:
>
> suspend
> | oldList |
> <primitive: 88> "primitive will fail, if receiver is not active
> process, which is just our case"
> myList ifNil:[^nil].
> oldList := myList.
> myList := nil.
> oldList remove: self ifAbsent:[].
> oldList isKindOf: Semaphore ifTrue: [ semaphore := oldList ].
> ^oldList
>
>
> - then, in the #resume
>
> resume
> suspendedContext ifNil: [^ self primitiveFailed].
> semaphore ifNotNil: [
> semaphore resumeWaiting: self.
> semaphore := nil.
> ^ self.
> ]
> ^ self primitiveResume
>
> and
>
> Semaphore>>resumeWaiting: aProcess
> excessSignals>0
> ifTrue: [excessSignals := excessSignals-1.
> aProcess primitiveResume. "resume immediately" ]
> ifFalse: [self addLastLink: aProcess]
>
> ----------
> An attached file is the changeset which makes following test to work
> w/o failure:
>
> |sema proc bool |
> bool := true.
> sema := Semaphore new.
> proc := [ sema wait. bool ifTrue: [self error: 'should not be there'].
> Transcript show: 'semaphore signaled' ] fork.
> Processor yield.
> proc newSuspend.
> proc newResume.
> self assert: (sema last == proc).
> bool := false.
> sema signal.
>
>
> P.S. i didn't checked how this fix will affect other things around
> processes, like mentioned by Mike:
>
> Note that Process>>signalException: (as of in Squeak 3.8, it may be
> fixed in later versions?) relies on this broken behaviour.
>
>
>
>
> ------------------------------------------------------------------------
>
>
More information about the Squeak-dev
mailing list
|