[Box-Admins] Re: squeaksource.com image update (was: Does
source.squeak.org have the socket leak problem?)
asqueaker at gmail.com
Wed Oct 23 22:09:47 UTC 2013
Today I discovered SeasidePlatformSupport class>>#deliverMailFrom:to:
is part of the error-handler for SqueakSource.
When IT has a problem, however, the same error handler calls
#deliverMailFrom:to: again as a means for handling the error which
occurred while trying to handle the original error.
So, it's a runaway stack in a background process that is also inside
some Mutex's critical: block which causes other requests to not be
deliverMailFrom:to: opens up a socket. Hmmm.....
I've duplicated the problem in my localhost and applied a patch which
I'll commit to source.squeak.org/ss shortly.
On Sat, Oct 19, 2013 at 7:35 AM, David T. Lewis <lewis at mail.msen.com> wrote:
> On Sat, Oct 19, 2013 at 08:02:22AM +0100, Frank Shearar wrote:
>> On 18 October 2013 21:17, David T. Lewis <lewis at mail.msen.com> wrote:
>> >> On Fri, Oct 18, 2013 at 12:59 PM, David T. Lewis <lewis at mail.msen.com>
>> >> wrote:
>> >>> At this point, the SqueakSource code in our squeaksource.com image
>> >>> should be identical to that of our source.squeak.org image. If I fix
>> >>> anything, I'll certainly commit the changes, but someone else fixed the
>> >>> socket leak problem and all I did is get squeaksource.com updated to
>> >>> take
>> >>> advantage of those fixes.
>> >> What are those fixes? I would like to ensure they're part of the
>> >> new-trunk SS image at box4.squeak.org:8888.
>> > I do not know what the fixes were, and I cannot say if they were fixes to
>> > SqueakSource, Seaside, or something in Squeak itself. I would certainly
>> > expect that the new image you are preparing on box4 will already contain
>> > the necessary fixes, but the only way find out for sure is to keep an eye
>> > on your new image and watch for socket leaks. That's just a matter of
>> > watching /proc/<squeakpid>/fd/* and looking at how many sockets are open.
>> > If the number grows over time, that's not good. If the total number of
>> > open file descriptors approaches 1024, it is a Very Bad Thing.
>> Obviously you want to address the root cause - leaking descriptors -
>> but a mitigation is to up the fd quota through
> One more update - the file descriptor leak is not gone, although it
> is clearly much improved compared to the old image. Within the last
> day or so, the open descriptor count went up from about 40 to about
> 340. So the problem still happens, but much less frequently.
> I am not going to restart the image, as I want to keep monitoring it
> and see how long it can go unattended. I am running a process in the
> image that will check fd count every few hours, and restart it if
> the count goes over 800. That should protect against image lockups
> if the count goes too high while I am not paying attention.
> For the record, the socket leak process is:
> [[vmFileCount := (FileDirectory on: '/proc/', OSProcess thisOSProcess pid asString, '/fd')
> entries size.
> OSProcess trace: DateAndTime now asString, ' squeakvm has ', vmFileCount asString,
> ' open file descriptors'.
> vmFileCount > 800 ifTrue: [
> OSProcess trace: 'Too many open file handles, save image and exit'.
> "Save the image, exit and wait for the supervisory script to restart"
> Smalltalk snapshot: true andQuit: true].
> (Delay forSeconds: 3 * 3600) wait] repeat] fork name: 'the Socket leak monitor'.
More information about the Box-Admins