[Vm-dev] [BUG] Mysterious Delay lockups

Brad Fowlow brad.fowlow at qwaq.com
Wed Apr 18 06:24:55 UTC 2007


Ouch, that hurts.

Maybe just...

		| caught |

		caught := false.
		[
			[ caught := true. self wait ] ifCurtailed: [ caught := false ].
			blockValue := aBlock value.
		] ensure: [ caught ifTrue: [ self signal ]]

might be closer?  (Fixtemp'd or whatever to meet the need, perhaps.)

(I'm assuming the only thing that can 'curtail' a wait here is a  
terminate.)

-b


> Hi Folks -
>
> Some of you (mostly those who run heavy servers) may have noticed  
> that at times Squeak locks up in mysterious and unforeseen ways.  
> One of those lockups involves Delay's AccessProtect in an  
> unsignaled state and consequently the entire image locking up since  
> Delay access is required in many, many places.
>
> Today, David presented me an image that was locked up in such a  
> state but by sheer luck he managed to save it right before it  
> happened which allowed me to investigate the situation. The result  
> can best be explained by the little test case shown here:
>
>   "Create mutex unsignaled so we can manually signal it"
>   mutex := Semaphore new.
>   "Create a process which will wait inside the mutex"
>   p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
>   "Wait until process has entered mutex"
>   [p suspendingList == mutex]
>       whileFalse:[(Delay forMilliseconds: 10) wait].
>   "Signal mutex"
>   mutex signal.
>   "Kill process"
>   p terminate.
>   "and check to see if the mutex is signaled"
>   mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].
>
> Note that despite the somewhat complex setup the basic idea is that  
> a low priority process waiting in a critical section receives a  
> signal on the semaphore it is waiting on but gets terminated by a  
> higher priority process inbetween receiving the signal and  
> execution of the process itself.
>
> This situation (manually executed in the above to make it more  
> easily repeatable) can happen in many situations where processes  
> get terminated "from the outside" and it would cause particular  
> grief in the timing semaphore because it gets served by the highest  
> priority process which makes the unfortunate cause of events much  
> more likely.
>
> All Squeak versions that I have access to expose this behavior.  
> Looking at Semaphore>>critical: which says
>
> Semaphore>>critical: aBlock
>   self wait.
>   [blockValue := aBlock value] ensure: [self signal].
>
> makes it seem as if moving the wait into the ensured block is the  
> correct answer, but that ain't necessarily so. When we move the  
> wait into the block we risk that the entering process is terminated  
> after entering the block but before entering the wait which would  
> leave the semaphore signaled twice, which is just as bad as not  
> signaled at all.
>
> Methinks a solution would involve Process>>terminate but I'm  
> running out of steam after trying to understand the problem in all  
> its implications. Any ideas would be greatly welcome.
>
> Cheers,
>   - Andreas



More information about the Vm-dev mailing list