[Vm-dev] Squeak socket problem ... help!

Göran Krampe goran at krampe.se
Fri Oct 10 00:09:08 UTC 2014


Hi guys!

On 10/10/2014 01:28 AM, David T. Lewis wrote:
>>> Ron and I (3DICC) have a problem with the Unix VM networking and I am
>>> reaching out before burning too many hours on something one of you
>>> C-Unix/Socket/VM guys can fix in an afternoon - and earn a buck for your
>>> trouble.
>>
>> Cool.  This is likely easy to fix.  Your image is running out of file
>> descriptors.  Track open and close calls, e.g. add logging around at
>> least StandardFileStream>>#primOpen:writable:
>> , AsyncFile>>#primOpen:forWrite:semaIndex:,
>> Socket>>#primAcceptFrom:receiveBufferSize:sendBufSize:semaIndex:readSemaIndex:writeSemaIndex:
>> and their associated close calls and see what's being opened without being
>> closed.  It shoudl be easy to track=down, but may be more difficult to fix.
>>
>> Good luck!

Aha. Soo... am I understanding this correctly - we are probably leaking 
fds and when we go above 1024 this makes select() go bonkers and 
eventually leads to the "Bad file descriptor" error?

> I agree with what Eliot is saying and would add a few thoughts:

> - Don't fix the wrong problem (DFtWP). Unless you have some reason to
> believe that this server application would realistically have a need to
> handle anything close to a thousand concurrent TCP sessions, don't fix
> it by raising the per-process file handle limit, and don't fix it by
> reimplementing the socket listening code.

We haven't done the exact numbers, but we could probably hit several 
hundreds concurrent at least. 1024 seemed a bit "over the top" though :)

The system in question is meant to serve more than 1000 concurrent 
users, so we are in fact moving into this territory. We have been up to 
around 600 so far.

> - It is entirely possible that no one before you has ever tried to run
> a server application with the per-process file handle limit bumped up
> above the default 1024. So if that configuration does not play nicely
> with the select() mechanism, you may well be the first to have encountered
> this as an issue. But see above, don't fix it if it ain't broke.

Well, it most probably *is* broke - I mean - I haven't read anywhere 
that our Socket code is limited to 1024 concurrent sockets and that 
going above that limit causes the Socket code to stop working? :)

But I agree - I don't want to touch that code if we can simply avoid 
this bug by making sure we stay below 1024.

But it sounds broke to me, nevertheless. ;)

> - Most "out of file descriptor" problems involve resource leaks (as Eliot
> is suggesting), and in those cases you will see a gradual increase in file
> descriptors in /proc/<vmpid>/fd/ over time. Eventually you run out of
> descriptors and something horrible happens.

We will start looking at that and other tools too.


> - Sorry to repeat myself but this is by far the most important point: DFtWP.

Sure :). This is why I posted - to get your input. And I have a 
suspicion that the SAML issue I mentioned may be the code leaking, we 
will start looking.

regards, Göran


More information about the Vm-dev mailing list