[squeak-dev] Re: Suspending process fix

Andreas Raab andreas.raab at gmx.de
Tue Apr 28 03:25:09 UTC 2009


Hi Igor -

Having spent more time than I dare admitting in debugging multi-process 
interactions on heavily utilized servers I can tell you that your fix 
isn't going to work very well ;-) If only because it relies on #suspend 
being non-atomic which was one of the largest sources of problems when 
used with semaphores. Atomic suspension is absolutely critical since the 
VM may or may not decide to switch processes when you are in the midst 
of modifying that list and when this happens you're in for a very bad 
time (processes will vanish in thin air and garbage collected). Making 
Process>>suspend be atomic in all circumstances was one of the major 
advances for stability on our servers and I'm kinda surprised these 
changes are not in the latest VMs (I will verify this later).

One thing I'm curious about is what is the use case are you looking at? 
I have never come across this particular problem in practice since 
explicit suspend and resume operations are exceptionally rare outside of 
the debugger.

That said, I am not certain the behavior is "broken" to gegin with, in 
particular consider the alternatives. Depending on what problem you're 
trying to solve, you might want to work around the issue by utilizing 
the return value from #suspend. For example, doing something like:

   list := process suspend.
   list resumeProcess: process.

and then implement

Semaphore>>resumeProcess: aProcess
   "The process was waiting on this semaphore. Put it back on."

Of course, this also illustrates one of the problems with doing this in 
general which is: Are you going to put the process at the beginning of 
the list? In the middle? At the end? Since any decision probably depends 
on application specific context, one could argue that the best way to 
deal with it then is along the lines of, say:

list := process suspend.
list
   ifNil:[process resume "process was active or suspended already"]
   ifNotNil:[list addFirst: process] "put process back on list"

But of course the latter then raises the issue that if this is an 
already signaled semaphore this shouldn't be doing that either. Icky stuff.

Which is why I really do prefer the current behavior which gives you a 
consistent behavior across the board. #suspend takes a process of a 
semaphore, #resume simply makes the process runnable again. It may not 
be what you expected it to be but it is consistent and sometimes this is 
all you can ask for. If you need specifically different behavior you can 
implement that in a way described above but it isn't clear to me whether 
any of these would be improvements because when it comes to the edge 
cases they are just as problematic as the current solution.

Cheers,
   - Andreas


Igor Stasenko wrote:
> Hello, i trying to provide a fix to the issue
> http://bugs.squeak.org/view.php?id=6822
> what is needed to make following code work w/o error:
> 
> |sema proc |
> sema := Semaphore new.
> proc := [ sema wait. self error: 'should not be there' ] fork.
> Processor yield.
> proc suspend.
> proc resume.
> 
> The problem here, is that once you issue a #suspend on a process it
> puts nil to its myList ivar, ignoring the fact that it could be
> waiting on semaphore.
> This leads to disconnecting the process from semaphore, and any
> subsequent 'sema signal' will be simply lost:
> 
> suspend
> 	| oldList |
> 	<primitive: 88>   "primitive will fail, if receiver is not active
> process, which is just our case"
> 	myList ifNil:[^nil].
> 	oldList := myList.
> 	myList := nil.
> 	oldList remove: self ifAbsent:[].
> 	^oldList
> 
> the possible fix would be:
> - add an instance variable to Process, say 'semaphore'.
> 
> - fix the #suspend to following:
> 
> suspend
> 	| oldList |
> 	<primitive: 88>   "primitive will fail, if receiver is not active
> process, which is just our case"
> 	myList ifNil:[^nil].
> 	oldList := myList.
> 	myList := nil.
> 	oldList remove: self ifAbsent:[].
> oldList isKindOf: Semaphore ifTrue: [ semaphore := oldList ].
> 	^oldList
> 
> 
> - then, in the #resume
> 
> resume
> 	suspendedContext ifNil: [^ self primitiveFailed].
>         semaphore ifNotNil: [
>             semaphore resumeWaiting: self.
>             semaphore := nil.
>             ^ self.
>         ]
> 	^ self primitiveResume
> 
> and
> 
> Semaphore>>resumeWaiting: aProcess
> 	excessSignals>0
> 		ifTrue: [excessSignals := excessSignals-1.
>                     aProcess primitiveResume. "resume immediately" ]
> 		ifFalse: [self addLastLink: aProcess]
> 
> ----------
> An attached file is the changeset which makes following test to work
> w/o failure:
> 
> |sema proc bool |
> bool := true.
> sema := Semaphore new.
> proc := [ sema wait. bool ifTrue: [self error: 'should not be there'].
> Transcript show: 'semaphore signaled' ] fork.
> Processor yield.
> proc newSuspend.
> proc newResume.
> self assert: (sema last == proc).
> bool := false.
> sema signal.
> 
> 
> P.S. i didn't checked how this fix will affect other things around
> processes, like mentioned by Mike:
> 
> Note that Process>>signalException: (as of in Squeak 3.8, it may be
> fixed in later versions?) relies on this broken behaviour.
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 




More information about the Squeak-dev mailing list