[Box-Admins] Re: squeaksource.com image update (was: Does source.squeak.org have the socket leak problem?)

Wed Oct 23 22:09:47 UTC 2013

Today I discovered SeasidePlatformSupport class>>#deliverMailFrom:to:
is part of the error-handler for SqueakSource.

When IT has a problem, however, the same error handler calls
#deliverMailFrom:to: again as a means for handling the error which
occurred while trying to handle the original error.

So, it's a runaway stack in a background process that is also inside
some Mutex's critical: block which causes other requests to not be
processed.

deliverMailFrom:to: opens up a socket.  Hmmm.....

I've duplicated the problem in my localhost and applied a patch which
I'll commit to source.squeak.org/ss shortly.

 - Chris

On Sat, Oct 19, 2013 at 7:35 AM, David T. Lewis <lewis at mail.msen.com> wrote:
> On Sat, Oct 19, 2013 at 08:02:22AM +0100, Frank Shearar wrote:
>> On 18 October 2013 21:17, David T. Lewis <lewis at mail.msen.com> wrote:
>> >> On Fri, Oct 18, 2013 at 12:59 PM, David T. Lewis <lewis at mail.msen.com>
>> >> wrote:
>> >>> At this point, the SqueakSource code in our squeaksource.com image
>> >>> should be identical to that of our source.squeak.org image. If I fix
>> >>> anything, I'll certainly commit the changes, but someone else fixed the
>> >>> socket leak problem and all I did is get squeaksource.com updated to
>> >>> take
>> >>> advantage of those fixes.
>> >>
>> >> What are those fixes?  I would like to ensure they're part of the
>> >> new-trunk SS image at box4.squeak.org:8888.
>> >>
>> >
>> > I do not know what the fixes were, and I cannot say if they were fixes to
>> > SqueakSource, Seaside, or something in Squeak itself. I would certainly
>> > expect that the new image you are preparing on box4 will already contain
>> > the necessary fixes, but the only way find out for sure is to keep an eye
>> > on your new image and watch for socket leaks. That's just a matter of
>> > watching /proc/<squeakpid>/fd/* and looking at how many sockets are open.
>> > If the number grows over time, that's not good. If the total number of
>> > open file descriptors approaches 1024, it is a Very Bad Thing.
>>
>> Obviously you want to address the root cause - leaking descriptors -
>> but a mitigation is to up the fd quota through
>> /etc/security/limits.conf
>>
>
> One more update - the file descriptor leak is not gone, although it
> is clearly much improved compared to the old image. Within the last
> day or so, the open descriptor count went up from about 40 to about
> 340. So the problem still happens, but much less frequently.
>
> I am not going to restart the image, as I want to keep monitoring it
> and see how long it can go unattended. I am running a process in the
> image that will check fd count every few hours, and restart it if
> the count goes over 800. That should protect against image lockups
> if the count goes too high while I am not paying attention.
>
> For the record, the socket leak process is:
>
>     [[vmFileCount := (FileDirectory on: '/proc/', OSProcess thisOSProcess pid asString, '/fd')
>             entries size.
>     OSProcess trace: DateAndTime now asString, ' squeakvm has ', vmFileCount asString,
>             ' open file descriptors'.
>     vmFileCount > 800 ifTrue: [
>             OSProcess trace: 'Too many open file handles, save image and exit'.
>             "Save the image, exit and wait for the supervisory script to restart"
>             Smalltalk snapshot: true andQuit: true].
>     (Delay forSeconds: 3 * 3600) wait] repeat] fork name: 'the Socket leak monitor'.
>
> Dave
>