[Vm-dev] unix SqueakSource socket 'too many open files' problem

David T. Lewis lewis at mail.msen.com
Thu Feb 25 21:12:02 UTC 2021


Socket leaks were always a problem on the older squeaksource.com
images, although I don't think it has been happening since I updated
it to a Squeak 5.3 image, and I don't recall that it was ever a
problem on source.squeak.org.

I don't know what the root cause was, and it's certainly possible
that socket leak issues still exist but are not showing up with
the usage or load on our servers.

If you are using image based persistence (as opposed to Magma
back end), then the utility that I used to use for squeaksource.com
is a handy workaround (actually it is still running on our
squeaksource.com image, which is the reason I am not entirely
sure if the socket leak problem is actually gone).

Take a look in the http://www.squeaksource.com/SSImageInit repository
and have a look at the socket leak monitor. Again to be clear,
you don't want to run this with Magma, but in any case it may
give you and idea for how to handle it.

I do have to note that the error messages you are getting may be
a different issue entirely. The socket leaks I was dealing with
menifested as open file descriptors (visible in /proc/<pid>/fd/*).
The error messages you are getting may be another issue entirely.
If you run out of file descriptors the image will hang, and if
I recall correctly it may have crashed the VM. Possibly that
would lead to the later symptoms you are seeing now.

Dave


On Thu, Feb 25, 2021 at 11:42:23AM -0800, tim Rowledge wrote:
>  
> I've been running a squeaksource system on a 64bit linux server for a year or so and just started having odd problems.
> 
> Talking with Chris about it, and looking into the assorted logs, lead to the theory that sometihng (unknown and I can't see how possible) caused the image to save and quit at some point when the expected behaviour is to *not* ever save. That lead to some complaints about Magma related details that don't matter here.
> 
> I had much fun (the usual linux user names/permissions stuff) but eventually got a new copy of the squeaksource image running. Yay!
> 
> Except after an indeterminate time (several hours, not a whole day) the webpage part of the sytem became unaccessible even though it had not actually crashed out. The log contained a bunch of lines like this - 
> 3729ab18e70dac 2021-02-24T23:37:53.41687-05:00 CurrentEnvironment: 
> @40000000603729ab18e71d4c 2acceptHandler: Too many open files
> @4000000060379f1127814ddc acceptHandler: aborting server 12 pss=0x14df900
> @4000000060379f112782dc4c socketStatus: freeing invalidated pss=0x14df900
> @400000006037b1ef0d99c214 acceptHandler: Too many open files
> @400000006037b1ef0d9a5e54 acceptHandler: aborting server 12 pss=0x7f2f8c8d4630
> @400000006037b1ef0d9c7194 socketStatus: freeing invalidated pss=0x7f2f8c8d4630
> 
> 
> ... which are errors from the VM socket handlers.
> 
> I've not seen this happening before that I recall. Any ideas? Stuff to check? Solutions?
> 
> tim
> --
> tim Rowledge; tim at rowledge.org; http://www.rowledge.org/tim
> Strange OpCodes: TDB: Transfer and Drop Bits
> 
> 


More information about the Vm-dev mailing list