[BUG] Mysterious Delay lockups
J J
azreal1977 at hotmail.com
Sat May 5 09:13:44 UTC 2007
Did anything ever happen with this? I didn't see anything else get posted
here, or anything when I google with site: pointed at squeak archives.
>From: Andreas Raab <andreas.raab at gmx.de>
>Reply-To: The general-purpose Squeak developers
>list<squeak-dev at lists.squeakfoundation.org>
>To: The general-purpose Squeak developers
>list<squeak-dev at lists.squeakfoundation.org>, Squeak Virtual Machine
>Development Discussion<vm-dev at lists.squeakfoundation.org>
>Subject: [BUG] Mysterious Delay lockups
>Date: Tue, 17 Apr 2007 19:20:28 -0700
>
>Hi Folks -
>
>Some of you (mostly those who run heavy servers) may have noticed that at
>times Squeak locks up in mysterious and unforeseen ways. One of those
>lockups involves Delay's AccessProtect in an unsignaled state and
>consequently the entire image locking up since Delay access is required in
>many, many places.
>
>Today, David presented me an image that was locked up in such a state but
>by sheer luck he managed to save it right before it happened which allowed
>me to investigate the situation. The result can best be explained by the
>little test case shown here:
>
> "Create mutex unsignaled so we can manually signal it"
> mutex := Semaphore new.
> "Create a process which will wait inside the mutex"
> p := [mutex critical:[]] forkAt: Processor userBackgroundPriority.
> "Wait until process has entered mutex"
> [p suspendingList == mutex]
> whileFalse:[(Delay forMilliseconds: 10) wait].
> "Signal mutex"
> mutex signal.
> "Kill process"
> p terminate.
> "and check to see if the mutex is signaled"
> mutex isSignaled ifFalse:[self error: 'Mutex not signaled'].
>
>Note that despite the somewhat complex setup the basic idea is that a low
>priority process waiting in a critical section receives a signal on the
>semaphore it is waiting on but gets terminated by a higher priority process
>inbetween receiving the signal and execution of the process itself.
>
>This situation (manually executed in the above to make it more easily
>repeatable) can happen in many situations where processes get terminated
>"from the outside" and it would cause particular grief in the timing
>semaphore because it gets served by the highest priority process which
>makes the unfortunate cause of events much more likely.
>
>All Squeak versions that I have access to expose this behavior. Looking at
>Semaphore>>critical: which says
>
>Semaphore>>critical: aBlock
> self wait.
> [blockValue := aBlock value] ensure: [self signal].
>
>makes it seem as if moving the wait into the ensured block is the correct
>answer, but that ain't necessarily so. When we move the wait into the block
>we risk that the entering process is terminated after entering the block
>but before entering the wait which would leave the semaphore signaled
>twice, which is just as bad as not signaled at all.
>
>Methinks a solution would involve Process>>terminate but I'm running out of
>steam after trying to understand the problem in all its implications. Any
>ideas would be greatly welcome.
>
>Cheers,
> - Andreas
>
_________________________________________________________________
Get a FREE Web site, company branded e-mail and more from Microsoft Office
Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/
More information about the Squeak-dev
mailing list
|