Delay and Server reliability

Adrian Lienhard adi at netstyle.ch
Sat Jul 28 13:08:43 UTC 2007


We have seen this exact problem as well, although not often. An  
interesteing observation was that you could bring the image back  
alive through the Seaside screenshot application...

Nice to see that these kind of production issues fixed.
Andreas, if you have more patches laying around, let us know ;)

I think it would make sense to have a wiki page to keep track of  
important fixes and their associated Mantis reports, as well as  
instructions for debugging (like gdb, VM instrumentation).

Cheers,
Adrian

On Jul 28, 2007, at 03:27 , Andreas Raab wrote:

> Ah yes, of course. You're missing another batch of fixes that we  
> have long applied to our servers. In this case it's the handling of  
> Semaphore>>critical: (which is broken in all Squeak versions). Give  
> the attached changes a try and if it still don't work I'm sure  
> there are more fixes that we've applied in the meantime ;-)
>
> Cheers,
>   - Andreas
>
> David Shaffer wrote:
>> Andreas Raab wrote:
>>> I'm pretty sure it's complete. If you want some help do this:
>>> * Launch the VM with output redirected to a file
>>> * Wait until it locks up
>>> * Attach gdb to the running process, e.g,:
>>>   gdb --pid: <pid of vm>
>>> * Have it print all the call stacks, e.g.:
>>>   p (int)printAllStacks
>>> * Look at the result output file.
>>>
>> Thanks for the gdb tip.  I can look at the processes via my  
>> Seaside server as well since it is still responding.  Anyway the  
>> debugging output is below.  The TdTimer processes are not making  
>> progress although, in one case, the sleep should only be for 60  
>> seconds.  The VNC server accepts connections but isn't responding  
>> to user input including alt-. (although the VNC cursor tracks and  
>> there is sometimes UI activity if, for example, the Transcript  
>> window is open).  If I enter (Delay forSeconds: 5) wait in  a web- 
>> browser based workspace it will hang forever although, as I  
>> mentioned, I am able to interact with the image in other ways  
>> through this workspace.
>> It seems like the list below isn't complete since I have a web  
>> server process blocked waiting for connections...but anyway the  
>> image exhibits this behavior with and without your Delay patch  
>> applied.
>> David
>> Process
>> 2064888972 >idleProcess
>> 2064858556 [] in >startUp
>> 2064858648 [] in BlockContext>newProcess
>> Process
>> 2064885004 >finalizationProcess
>> 2064884820 [] in >restartFinalizationProcess
>> 2064884912 [] in BlockContext>newProcess
>> Process
>> 2085972252 Semaphore>critical:
>> 2085972068 Delay>scheduleEvent
>> 2085971932 Delay>schedule
>> 2085971840 Delay>wait
>> 2085971748 WorldState>interCyclePause:
>> 2085971656 WorldState>doOneCycleFor:
>> 2085971564 PasteUpMorph>doOneCycle
>> 2054607980 [] in >spawnNewProcess
>> 2054608164 [] in BlockContext>newProcess
>> Process
>> 2085972988 Semaphore>critical:
>> 2085972804 Delay>scheduleEvent
>> 2085972712 Delay>schedule
>> 2085972620 Delay>wait
>> 2085972436 [] in EventSensor>eventTickler
>> 2085972344 BlockContext>on:do:
>> 2064857692 EventSensor>eventTickler
>> 2064857416 [] in EventSensor>installEventTickler
>> 2064857600 [] in BlockContext>newProcess
>> Process
>> 2085973768 Semaphore>critical:
>> 2085973584 Delay>scheduleEvent
>> 2085973448 Delay>schedule
>> 2085973356 Delay>wait
>> 2085973264 [] in ApplicationService>sleepFor:
>> 2085973172 >terminationOkDuring:
>> 2085973080 ApplicationService>sleepFor:
>> 2065158156 TdTimer>runWhile:
>> 2065157788 [] in ApplicationService>start
>> 2065157972 BlockContext>ensure:
>> 2064894444 [] in ApplicationService>start
>> 2065157696 BlockContext>on:do:
>> 2065157512 BlockContext>valueWithBindingsContext:
>> 2065157420 BlockContext>valueWithBindings:
>> 2064894536 [] in BlockContext>newProcessWithBindings:
>> 2064894628 [] in BlockContext>newProcess
>> Process
>> 2086064536 Semaphore>critical:
>> 2086064308 Delay>scheduleEvent
>> 2086064184 Delay>schedule
>> 2086064092 Delay>wait
>> 2086063816 [] in Semaphore>waitTimeoutMSecs:
>> 2086064000 [] in BlockContext>newProcess
>> Process
>> 2086089116 Semaphore>critical:
>> 2086088932 Delay>scheduleEvent
>> 2086088796 Delay>schedule
>> 2086088612 Delay>wait
>> 2086088704 [] in ApplicationService>sleepFor:
>> 2086088520 >terminationOkDuring:
>> 2086088428 ApplicationService>sleepFor:
>> 2064888880 TdTimer>runWhile:
>> 2064888512 [] in ApplicationService>start
>> 2064888696 BlockContext>ensure:
>> 2064886188 [] in ApplicationService>start
>> 2064888420 BlockContext>on:do:
>> 2064888236 BlockContext>valueWithBindingsContext:
>> 2064888144 BlockContext>valueWithBindings:
>> 2064886280 [] in BlockContext>newProcessWithBindings:
>> 2064886372 [] in BlockContext>newProcess
>> Process
>> 2092063728 Semaphore>critical:
>> 2092063544 Delay>scheduleEvent
>> 2092063408 Delay>schedule
>> 2092063200 Delay>wait
>> 2092063316 [] in ApplicationService>sleepFor:
>> 2092063092 >terminationOkDuring:
>> 2092063000 ApplicationService>sleepFor:
>> 2065160244 TdTimer>runWhile:
>> 2065157144 [] in ApplicationService>start
>> 2065157328 BlockContext>ensure:
>> 2064893660 [] in ApplicationService>start
>> 2065157052 BlockContext>on:do:
>> 2065156868 BlockContext>valueWithBindingsContext:
>> 2065156776 BlockContext>valueWithBindings:
>> 2064893752 [] in BlockContext>newProcessWithBindings:
>> 2064893844 [] in BlockContext>newProcess
>> Process
>> 2099204904 Semaphore>critical:
>> 2099204676 Delay>scheduleEvent
>> 2099204552 Delay>schedule
>> 2099204460 Delay>wait
>> 2099187636 [] in Semaphore>waitTimeoutMSecs:
>> 2099187820 [] in BlockContext>newProcess
>> Process
>> 2099550248 >handleTimerEvent
>> 2059283788 [] in >runTimerEventLoop
>> 2059283368 BlockContext>on:do:
>> 2059283140 >runTimerEventLoop
>> 2059283572 [] in >startTimerEventLoop
>> 2059283664 [] in BlockContext>newProcess
>> Process
>> 2064857200 InputSensor>userInterruptWatcher
>> 2064857016 [] in InputSensor>installInterruptWatcher
>> 2064857108 [] in BlockContext>newProcess
>> Process
>> 2064858136 SystemDictionary>lowSpaceWatcher
>> 2064858228 [] in SystemDictionary>installLowSpaceWatcher
>> 2064858320 [] in BlockContext>newProcess
>> Process
>> 2086063724 Semaphore>waitTimeoutMSecs:
>> 2086063632 Socket>waitForConnectionFor:ifTimedOut:
>> 2086063448 Socket>waitForConnectionFor:
>> 2086063172 [] in Socket>waitForAcceptFor:
>> 2086063356 BlockContext>on:do:
>> 2086063080 Socket>waitForAcceptFor:
>> 2086062896 [] in RFBServer>runLoop
>> 2086062804 BlockContext>on:do:
>> 2064890492 RFBServer>runLoop
>> 2064890612 [] in RFB
>
> <SemaphoreCritical-ar.1.cs>




More information about the Squeak-dev mailing list