[Vm-dev] Image freeze because handleTimerEvent and Seaside
process gone?!
Adrian Lienhard
adi at netstyle.ch
Mon Dec 6 14:56:47 UTC 2010
Thanks for the response, David
I checked .../fd/* when the image was frozen and there were only 7 or so open file handles. Nothing suspicious there...
Cheers,
Adrian
On Dec 6, 2010, at 12:54 , David T. Lewis wrote:
>
> Have a look at /proc/<vmpid>/fd/* for a VM process that has been
> running for a while, and check for accumulation of open file handles
> over time. If sockets or files are not being closed, the open file
> handles can accumulate over time and eventually hit whatever
> per-process limit is in place (typically 1024 per process). I could
> imagine this leading to a condition where the process that is trying
> to open new session requests would fail.
>
> If this turns out to be the case, check what your are doing with
> OSProcess, as it's quite easy to e.g. use a PipeableOSProcess and
> forget to close the output pipes when done.
>
> Dave
>
>
> On Mon, Dec 06, 2010 at 11:55:55AM +0100, Adrian Lienhard wrote:
>>
>> Hi all,
>>
>> We've been experiencing an "interesting" problem: the image freezes and does not response to HTTP requests anymore after it has been running for days.
>>
>> Here some basic information about our setup:
>>
>> Squeak VM 4.0.3-2202 compiled with gcc 4.3.2
>> PharoCore 1.1
>> OS Debian Lenny amd64 (CPUs are 4 Intel Xeon E5530 2.40GHz)
>>
>> - We have never seen the problem with the Squeak VM 3.9-9 and Squeak 3.9 on the identical machine and with the same application source (modulo some adaptations to make it run on Pharo).
>> - We run the VM with -mmap 512m -vm-sound-null -vm-display-null, and the UI process is suspended (Project uiProcess suspend)
>> - VM does not hog the CPU and memory usage is normal
>> - The meantime between failure is several weeks and we haven't managed to reproduce the problem
>> - The application mainly serves HTTP requests. When the image does not receive requests for some time we send it a STOP signal, when a request comes in it is sent a CONT signal.
>> - lsof shows
>> TCP *:9093 (LISTEN)
>> TCP server:9093->server:46930 (CLOSE_WAIT)
>>
>> Below is a GDB backtrace and the Smalltalk stacks from an image that was frozen (the VM had been running for almost 100 hours):
>>
>> =============================================================
>> (gdb) bt
>> #0 0x08072020 in ?? ()
>> #1 <signal handler called>
>> #2 0xb766f5e0 in malloc () from /lib/libc.so.6
>> #3 <function called from gdb>
>> #4 0xb76c50c8 in select () from /lib/libc.so.6
>> #5 0x08071063 in aioPoll ()
>> #6 0xb778bb8d in ?? () from /usr/lib/squeak/4.0.3-2202//so.vm-display-null
>> #7 0x000003e8 in ?? ()
>> #8 0x997b5a34 in ?? ()
>> #9 0xbfe7cb28 in ?? ()
>> #10 0x08074575 in ioRelinquishProcessorForMicroseconds ()
>> Backtrace stopped: frame did not save the PC
>>
>> (gdb) call printCallStack()
>> -1719969228 >idleProcess
>> -1719969320 >startUp
>> -1740134028 BlockClosure>newProcess
>> $3 = -1755344892
>>
>> (gdb) call (int) printAllStacks()
>> Process
>> -1719969228 >idleProcess
>> -1719969320 >startUp
>> -1740134028 BlockClosure>newProcess
>>
>> Process
>> -1740113860 >finalizationProcess
>> -1740113952 >restartFinalizationProcess
>> -1740113532 BlockClosure>newProcess
>>
>> Process
>> -1740134424 SmalltalkImage>lowSpaceWatcher
>> -1740134516 SmalltalkImage>installLowSpaceWatcher
>> -1740134300 BlockClosure>newProcess
>>
>> Process
>> -1719451488 Delay>wait
>> -1719451580 BlockClosure>ifCurtailed:
>> -1719451704 Delay>wait
>> -1719451796 InputEventPollingFetcher>waitForInput
>> -1740126940 InputEventFetcher>eventLoop
>> -1740127032 InputEventFetcher>installEventLoop
>> -1740126816 BlockClosure>newProcess
>>
>> Process
>> -1719557780 UnixOSProcessAccessor>grimReaperProcess
>> -1740113624 BlockClosure>repeat
>> -1740113716 UnixOSProcessAccessor>grimReaperProcess
>> -1740117340 BlockClosure>newProcess
>>
>> [omitted many newlines between output above]
>> =============================================================
>>
>> What is striking from the above process listing is that two processes are missing: the handleTimerEvent process and the Seaside process (that is, the TCP listener loop). How comes these processes vanished?
>>
>> This may be related to Pharo or to the Squeak VM.
>>
>> Has anybody else seen this problem? Any idea how to debug/fix this issue is very much appreciated!
>>
>> Cheers,
>> Adrian
>>
>>
>> CCed to pharo-dev since this may be related to Pharo; please respond on the squeak-vm list
>>
More information about the Vm-dev
mailing list