[Vm-dev] Odd image lock-up on Unix

Andreas Raab andreas.raab at gmx.de
Mon Apr 2 22:57:13 UTC 2007


Bert Freudenberg wrote:
> There was an increasing number of reports of Linux VMs freezing or 
> crashing lately. We suspect some possibly unrelated problems. One is due 
> to oops-as-signed, a symptom is that GC fails to find the end of memory. 
> The other seems related to async i/o. In yet others (and this may be 
> yours) we see nanosleeps that are called with huge negative microsecond 
> counts, yet higher up in the stack the value was 1000 microseconds.

No, it wasn't the problem. I think we got *that* problem in our second 
lockup today:

(gdb) where
#0  0x007e0402 in ?? ()
#1  0x0019d7d0 in __nanosleep_nocancel () from /lib/libc.so.6
#2  0x0806fc03 in aioSleep (microSeconds=-1075955024) at 
/home/araab/QwaqVM/platforms/unix/vm/aio.c:273
#3  0x00b34001 in display_ioRelinquishProcessorForMicroseconds 
(microSeconds=1000)
     at /home/araab/QwaqVM/platforms/unix/vm-display-X11/sqUnixX11.c:2304
#4  0x08058d20 in ioRelinquishProcessorForMicroseconds (us=1000) at 
/home/araab/QwaqVM/platforms/unix/vm/sqUnixMain.c:471
#5  0x0805ed9b in primitiveRelinquishProcessor () at gnu-interp.c:17940
#6  0x0806d64e in interpret () at gnu-interp.c:7882
#7  0x0805a4e8 in main (argc=8, argv=0xbfde4f14, envp=0xbfde4f38) at 
/home/araab/QwaqVM/platforms/unix/vm/sqUnixMain.c:1396


Note the negative microSeconds in aioSleep() and also note how the 
process dump (attached at the end) doesn't have the critical:[] sections 
all over it.

So there are at least two problems here although it does make me wonder 
that we got this so suddenly after (previously) zero problems. The two 
things that have changed on our end were both inside the image: For one 
the delay changes, but the other one is that we started to aggressively 
timeout network connections (too aggressively actually) which causes a 
lot of processes to be terminated, sockets destroyed etc. Previously we 
never did that, so maybe there is a race condition or something that 
leaves the unix socket code in a sick state?

Cheers,
   - Andreas

Process dump:
Process
-1741776716 ProcessorScheduler class>idleProcess
-1742284720 [] in ProcessorScheduler class>startUp
-1742284628 [] in BlockContext>newProcess

Process
-1714822836 Semaphore>wait
-1742256716 WeakArray class>finalizationProcess
-1742256900 [] in WeakArray class>restartFinalizationProcess
-1742256808 [] in BlockContext>newProcess

Process
-1714565076 Semaphore>wait
-1754828908 Delay class>timerInterruptWatcher
-1754828568 [] in Delay class>startTimerInterruptWatcher
-1754828476 [] in BlockContext>newProcess

Process
-1749060672 Semaphore>wait
-1749060764 Python class>runCallbackProcess
-1749061040 [] in Object>fork:at:
-1749060856 [] in BlockContext>newProcess

Process
-1742286520 Semaphore>wait
-1742286652 InputSensor>userInterruptWatcher
-1742286856 [] in InputSensor>installInterruptWatcher
-1742286744 [] in BlockContext>newProcess

Process
-1714738232 Semaphore>wait
-1714739244 Delay>wait
-1714738968 [] in EventSensor>eventTickler
-1714739336 BlockContext>on:do:
-1742286060 EventSensor>eventTickler
-1742286336 [] in EventSensor>installEventTickler
-1742286152 [] in BlockContext>newProcess

Process
-1742284904 Semaphore>wait
-1742285148 SystemDictionary>lowSpaceWatcher
-1742285332 [] in SystemDictionary>installLowSpaceWatcher
-1742285056 [] in BlockContext>newProcess

Process
-1718494260 Semaphore>wait
-1742275748 [] in UnixOSProcessAccessor>grimReaperProcess
-1742275840 BlockContext>repeat
-1742283044 [] in UnixOSProcessAccessor>grimReaperProcess
-1742282952 [] in BlockContext>newProcess

Process
-1714766652 Semaphore>wait
-1714766744 SharedQueue>next
-1742228012 QLogger>runLoggerProcess
-1742228196 [] in Object>fork:at:
-1742228104 [] in BlockContext>newProcess

Process
-1718568664 Semaphore>wait
-1718568756 SharedQueue>next
-1742198140 TIslandController>runEventLoop
-1742197864 [] in Object>fork:at:
-1742197772 [] in BlockContext>newProcess

Process
-1718567340 Semaphore>wait
-1718567588 TPromise>wait
-1742189848 [] in TMediaControlEndpoint>runPinger
-1742190124 BlockContext>repeat
-1742190216 TMediaControlEndpoint>runPinger
-1742190032 [] in TMediaControlEndpoint>startPinger
-1742190308 BlockContext>on:do:
-1742190624 [] in TMediaControlEndpoint>startPinger
-1742190440 [] in BlockContext>newProcess

Process
-1714963228 Semaphore>wait
-1714963320 SharedQueue>next
-1718535404 TIslandController>runEventLoop
-1718535128 [] in Object>fork:at:
-1718535036 [] in BlockContext>newProcess

Process
-1714563460 Semaphore>wait
-1714564432 Delay>wait
-1718492696 [] in QVNCScreen>runWatchdog
-1718492832 BlockContext>repeat
-1718492964 QVNCScreen>runWatchdog
-1718493240 [] in QVNCScreen>startWatchdog
-1718493056 [] in BlockContext>newProcess

Process
-1715192564 Semaphore>wait
-1715192656 Socket>waitForDataIfClosed:
-1715192748 Socket>waitForData
-1718489960 [] in RFBClientTalker>clientRunLoop
-1718489408 BlockContext>whileTrue:
-1718489684 [] in RFBSocket>runSafely:
-1718490052 BlockContext>on:do:
-1718489868 [] in RFBSocket>runSafely:
-1718490144 BlockContext>ensure:
-1718490236 RFBSocket>runSafely:
-1718490328 RFBClientTalker>clientRunLoop
-1718491004 [] in RFBClientTalker>connectTo:display:portOffset:
-1718490912 [] in BlockContext>newProcess

Process
-1714961064 Semaphore>wait
-1714961312 TPromise>wait
-1718527168 [] in TMediaControlEndpoint>runPinger
-1718527076 BlockContext>repeat
-1718527260 TMediaControlEndpoint>runPinger
-1718527444 [] in TMediaControlEndpoint>startPinger
-1718527536 BlockContext>on:do:
-1718527852 [] in TMediaControlEndpoint>startPinger
-1718527668 [] in BlockContext>newProcess

Process
-1714596252 Semaphore>wait
-1714596344 SharedQueue>next
-1714810692 TIslandController>runEventLoop
-1714810416 [] in Object>fork:at:
-1714810324 [] in BlockContext>newProcess

Process
-1714634468 Semaphore>wait
-1714634560 SharedQueue>next
-1714804448 TMessageRelay>runWriterProcess
-1714804632 [] in Object>fork:at:
-1714804540 [] in BlockContext>newProcess

Process
-1714670904 Semaphore>wait
-1714671848 Semaphore>waitTimeoutMSecs:
-1714672616 Socket>waitForDataFor:ifClosed:ifTimedOut:
-1714672708 Socket>waitForDataFor:
-1714672216 [] in TMessageRelay>runReaderProcess
-1714672800 BlockContext>on:do:
-1714805036 TMessageRelay>runReaderProcess
-1714805220 [] in Object>fork:at:
-1714805128 [] in BlockContext>newProcess

Process
-1714594604 Semaphore>wait
-1714594696 SharedQueue>next
-1714799248 TIslandController>runEventLoop
-1714798972 [] in Object>fork:at:
-1714798880 [] in BlockContext>newProcess

Process
-1714622296 Semaphore>wait
-1714622388 SharedQueue>next
-1714793004 TMessageRelay>runWriterProcess
-1714793188 [] in Object>fork:at:
-1714793096 [] in BlockContext>newProcess

Process
-1714662828 Semaphore>wait
-1714663772 Semaphore>waitTimeoutMSecs:
-1714664540 Socket>waitForDataFor:ifClosed:ifTimedOut:
-1714664632 Socket>waitForDataFor:
-1714664140 [] in TMessageRelay>runReaderProcess
-1714664724 BlockContext>on:do:
-1714793592 TMessageRelay>runReaderProcess
-1714793776 [] in Object>fork:at:
-1714793684 [] in BlockContext>newProcess

Process
-1714592956 Semaphore>wait
-1714593048 SharedQueue>next
-1714787096 TIslandController>runEventLoop
-1714786820 [] in Object>fork:at:
-1714786728 [] in BlockContext>newProcess

Process
-1714610116 Semaphore>wait
-1714610208 SharedQueue>next
-1714780388 TMessageRelay>runWriterProcess
-1714780572 [] in Object>fork:at:
-1714780480 [] in BlockContext>newProcess

Process
-1714654728 Semaphore>wait
-1714655672 Semaphore>waitTimeoutMSecs:
-1714656440 Socket>waitForDataFor:ifClosed:ifTimedOut:
-1714656532 Socket>waitForDataFor:
-1714656040 [] in TMessageRelay>runReaderProcess
-1714656624 BlockContext>on:do:
-1714780976 TMessageRelay>runReaderProcess
-1714781160 [] in Object>fork:at:
-1714781068 [] in BlockContext>newProcess

Process
-1714591308 Semaphore>wait
-1714591400 SharedQueue>next
-1714772080 TIslandController>runEventLoop
-1714771804 [] in Object>fork:at:
-1714771712 [] in BlockContext>newProcess

Process
-1714597936 Semaphore>wait
-1714598028 SharedQueue>next
-1714765440 TMessageRelay>runWriterProcess
-1714765624 [] in Object>fork:at:
-1714765532 [] in BlockContext>newProcess

Process
-1714646620 Semaphore>wait
-1714647564 Semaphore>waitTimeoutMSecs:
-1714648332 Socket>waitForDataFor:ifClosed:ifTimedOut:
-1714648424 Socket>waitForDataFor:
-1714647932 [] in TMessageRelay>runReaderProcess
-1714648516 BlockContext>on:do:
-1714766028 TMessageRelay>runReaderProcess
-1714766212 [] in Object>fork:at:
-1714766120 [] in BlockContext>newProcess

Process
-1714756184 Semaphore>wait
-1714756320 Delay>wait
-1742085088 QAppServer>runUpdateProcess
-1742137448 [] in Object>fork:at:
-1742137264 [] in BlockContext>newProcess

Process
-1714596528 Semaphore>wait
-1714597428 Delay>wait
-1714802356 [] in TMediaControlEndpoint>runPinger
-1714802632 BlockContext>repeat
-1714802724 TMediaControlEndpoint>runPinger
-1714802540 [] in TMediaControlEndpoint>startPinger
-1714802816 BlockContext>on:do:
-1714803132 [] in TMediaControlEndpoint>startPinger
-1714802948 [] in BlockContext>newProcess

Process
-1714594880 Semaphore>wait
-1714595780 Delay>wait
-1714791004 [] in TMediaControlEndpoint>runPinger
-1714790912 BlockContext>repeat
-1714791096 TMediaControlEndpoint>runPinger
-1714791280 [] in TMediaControlEndpoint>startPinger
-1714791372 BlockContext>on:do:
-1714791688 [] in TMediaControlEndpoint>startPinger
-1714791504 [] in BlockContext>newProcess

Process
-1714593232 Semaphore>wait
-1714594132 Delay>wait
-1714777792 [] in TMediaControlEndpoint>runPinger
-1714777700 BlockContext>repeat
-1714777884 TMediaControl


More information about the Vm-dev mailing list