[Box-Admins] Re: squeaksource.com image update (was: Does source.squeak.org have the socket leak problem?)

David T. Lewis lewis at mail.msen.com
Sat Oct 19 12:35:46 UTC 2013


On Sat, Oct 19, 2013 at 08:02:22AM +0100, Frank Shearar wrote:
> On 18 October 2013 21:17, David T. Lewis <lewis at mail.msen.com> wrote:
> >> On Fri, Oct 18, 2013 at 12:59 PM, David T. Lewis <lewis at mail.msen.com>
> >> wrote:
> >>> At this point, the SqueakSource code in our squeaksource.com image
> >>> should be identical to that of our source.squeak.org image. If I fix
> >>> anything, I'll certainly commit the changes, but someone else fixed the
> >>> socket leak problem and all I did is get squeaksource.com updated to
> >>> take
> >>> advantage of those fixes.
> >>
> >> What are those fixes?  I would like to ensure they're part of the
> >> new-trunk SS image at box4.squeak.org:8888.
> >>
> >
> > I do not know what the fixes were, and I cannot say if they were fixes to
> > SqueakSource, Seaside, or something in Squeak itself. I would certainly
> > expect that the new image you are preparing on box4 will already contain
> > the necessary fixes, but the only way find out for sure is to keep an eye
> > on your new image and watch for socket leaks. That's just a matter of
> > watching /proc/<squeakpid>/fd/* and looking at how many sockets are open.
> > If the number grows over time, that's not good. If the total number of
> > open file descriptors approaches 1024, it is a Very Bad Thing.
> 
> Obviously you want to address the root cause - leaking descriptors -
> but a mitigation is to up the fd quota through
> /etc/security/limits.conf
>

One more update - the file descriptor leak is not gone, although it
is clearly much improved compared to the old image. Within the last
day or so, the open descriptor count went up from about 40 to about
340. So the problem still happens, but much less frequently.

I am not going to restart the image, as I want to keep monitoring it
and see how long it can go unattended. I am running a process in the
image that will check fd count every few hours, and restart it if
the count goes over 800. That should protect against image lockups
if the count goes too high while I am not paying attention.

For the record, the socket leak process is:

    [[vmFileCount := (FileDirectory on: '/proc/', OSProcess thisOSProcess pid asString, '/fd')
            entries size.
    OSProcess trace: DateAndTime now asString, ' squeakvm has ', vmFileCount asString,
            ' open file descriptors'.
    vmFileCount > 800 ifTrue: [
            OSProcess trace: 'Too many open file handles, save image and exit'.
            "Save the image, exit and wait for the supervisory script to restart"
            Smalltalk snapshot: true andQuit: true].
    (Delay forSeconds: 3 * 3600) wait] repeat] fork name: 'the Socket leak monitor'.

Dave
 


More information about the Box-Admins mailing list