[Vm-dev] [BUG] Mysterious Delay lockups
Brad Fowlow
brad.fowlow at qwaq.com
Wed Apr 18 06:24:55 UTC 2007
Ouch, that hurts.
Maybe just...
| caught |
caught := false.
[
[ caught := true. self wait ] ifCurtailed: [ caught := false ].
blockValue := aBlock value.
] ensure: [ caught ifTrue: [ self signal ]]
might be closer? (Fixtemp'd or whatever to meet the need, perhaps.)
(I'm assuming the only thing that can 'curtail' a wait here is a
terminate.)
-b
> Hi Folks -
>
> Some of you (mostly those who run heavy servers) may have noticed
> that at times Squeak locks up in mysterious and unforeseen ways.
> One of those lockups involves Delay's AccessProtect in an
> unsignaled state and consequently the entire image locking up since
> Delay access is required in many, many places.
>
> Today, David presented me an image that was locked up in such a
> state but by sheer luck he managed to save it right before it
> happened which allowed me to investigate the situation. The result
> can best be explained by the little test case shown here:
>
> "Create mutex unsignaled so we can manually signal it"
> mutex := Semaphore new.
> "Create a process which will wait inside the mutex"
> p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
> "Wait until process has entered mutex"
> [p suspendingList == mutex]
> whileFalse:[(Delay forMilliseconds: 10) wait].
> "Signal mutex"
> mutex signal.
> "Kill process"
> p terminate.
> "and check to see if the mutex is signaled"
> mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].
>
> Note that despite the somewhat complex setup the basic idea is that
> a low priority process waiting in a critical section receives a
> signal on the semaphore it is waiting on but gets terminated by a
> higher priority process inbetween receiving the signal and
> execution of the process itself.
>
> This situation (manually executed in the above to make it more
> easily repeatable) can happen in many situations where processes
> get terminated "from the outside" and it would cause particular
> grief in the timing semaphore because it gets served by the highest
> priority process which makes the unfortunate cause of events much
> more likely.
>
> All Squeak versions that I have access to expose this behavior.
> Looking at Semaphore>>critical: which says
>
> Semaphore>>critical: aBlock
> self wait.
> [blockValue := aBlock value] ensure: [self signal].
>
> makes it seem as if moving the wait into the ensured block is the
> correct answer, but that ain't necessarily so. When we move the
> wait into the block we risk that the entering process is terminated
> after entering the block but before entering the wait which would
> leave the semaphore signaled twice, which is just as bad as not
> signaled at all.
>
> Methinks a solution would involve Process>>terminate but I'm
> running out of steam after trying to understand the problem in all
> its implications. Any ideas would be greatly welcome.
>
> Cheers,
> - Andreas
More information about the Vm-dev
mailing list