[Vm-dev] Image freeze because handleTimerEvent and Seaside process gone?!

Adrian Lienhard adi at netstyle.ch
Mon Dec 6 14:56:47 UTC 2010


Thanks for the response, David

I checked .../fd/* when the image was frozen and there were only 7 or so open file handles. Nothing suspicious there...

Cheers,
Adrian 

On Dec 6, 2010, at 12:54 , David T. Lewis wrote:

> 
> Have a look at /proc/<vmpid>/fd/* for a VM process that has been
> running for a while, and check for accumulation of open file handles
> over time. If sockets or files are not being closed, the open file
> handles can accumulate over time and eventually hit whatever
> per-process limit is in place (typically 1024 per process). I could
> imagine this leading to a condition where the process that is trying
> to open new session requests would fail.
> 
> If this turns out to be the case, check what your are doing with
> OSProcess, as it's quite easy to e.g. use a PipeableOSProcess and
> forget to close the output pipes when done.
> 
> Dave
> 
> 
> On Mon, Dec 06, 2010 at 11:55:55AM +0100, Adrian Lienhard wrote:
>> 
>> Hi all,
>> 
>> We've been experiencing an "interesting" problem: the image freezes and does not response to HTTP requests anymore after it has been running for days. 
>> 
>> Here some basic information about our setup:
>> 
>> Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
>> PharoCore 1.1
>> OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)
>> 
>> - We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 on the identical machine and with the same application source (modulo some adaptations to make it run on Pharo).
>> - We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the UI process is suspended (Project uiProcess suspend)
>> - VM does not hog the CPU and memory usage is normal
>> - The meantime between failure is several weeks and we haven't managed to reproduce the problem
>> - The application mainly serves HTTP requests. When the image does not receive requests for some time we send it a STOP signal, when a request comes in it is sent a CONT signal.
>> - lsof shows
>> 	TCP *:9093 (LISTEN)
>> 	TCP server:9093->server:46930 (CLOSE_WAIT)
>> 
>> Below is a GDB backtrace and the Smalltalk stacks from an image that was frozen (the VM had been running for almost 100 hours):
>> 
>> =============================================================
>> (gdb) bt
>> #0  0x08072020 in ?? ()
>> #1  <signal handler called>
>> #2  0xb766f5e0 in malloc () from /lib/libc.so.6
>> #3  <function called from gdb>
>> #4  0xb76c50c8 in select () from /lib/libc.so.6
>> #5  0x08071063 in aioPoll ()
>> #6  0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
>> #7  0x000003e8 in ?? ()
>> #8  0x997b5a34 in ?? ()
>> #9  0xbfe7cb28 in ?? ()
>> #10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
>> Backtrace stopped: frame did not save the PC
>> 
>> (gdb) call printCallStack() 
>> -1719969228 >idleProcess
>> -1719969320 >startUp
>> -1740134028 BlockClosure>newProcess
>> $3 = -1755344892
>> 
>> (gdb) call (int) printAllStacks()
>> Process
>> -1719969228 >idleProcess
>> -1719969320 >startUp
>> -1740134028 BlockClosure>newProcess
>> 
>> Process
>> -1740113860 >finalizationProcess
>> -1740113952 >restartFinalizationProcess
>> -1740113532 BlockClosure>newProcess
>> 
>> Process
>> -1740134424 SmalltalkImage>lowSpaceWatcher
>> -1740134516 SmalltalkImage>installLowSpaceWatcher
>> -1740134300 BlockClosure>newProcess
>> 
>> Process
>> -1719451488 Delay>wait
>> -1719451580 BlockClosure>ifCurtailed:
>> -1719451704 Delay>wait
>> -1719451796 InputEventPollingFetcher>waitForInput
>> -1740126940 InputEventFetcher>eventLoop
>> -1740127032 InputEventFetcher>installEventLoop
>> -1740126816 BlockClosure>newProcess
>> 
>> Process
>> -1719557780 UnixOSProcessAccessor>grimReaperProcess
>> -1740113624 BlockClosure>repeat
>> -1740113716 UnixOSProcessAccessor>grimReaperProcess
>> -1740117340 BlockClosure>newProcess
>> 
>> [omitted many newlines between output above]
>> =============================================================
>> 
>> What is striking from the above process listing is that two processes are missing: the handleTimerEvent process and the Seaside process (that is, the TCP listener loop). How comes these processes vanished?
>> 
>> This may be related to Pharo or to the Squeak VM.
>> 
>> Has anybody else seen this problem? Any idea how to debug/fix this issue is very much appreciated!
>> 
>> Cheers,
>> Adrian
>> 
>> 
>> CCed to pharo-dev since this may be related to Pharo; please respond on the squeak-vm list
>> 



More information about the Vm-dev mailing list