[Vm-dev] Squeak socket problem ... help!
David T. Lewis
lewis at mail.msen.com
Fri Oct 10 00:21:40 UTC 2014
On Fri, Oct 10, 2014 at 02:09:08AM +0200, G?ran Krampe wrote:
>
> Hi guys!
>
> On 10/10/2014 01:28 AM, David T. Lewis wrote:
> >>>Ron and I (3DICC) have a problem with the Unix VM networking and I am
> >>>reaching out before burning too many hours on something one of you
> >>>C-Unix/Socket/VM guys can fix in an afternoon - and earn a buck for your
> >>>trouble.
> >>
> >>Cool. This is likely easy to fix. Your image is running out of file
> >>descriptors. Track open and close calls, e.g. add logging around at
> >>least StandardFileStream>>#primOpen:writable:
> >>, AsyncFile>>#primOpen:forWrite:semaIndex:,
> >>Socket>>#primAcceptFrom:receiveBufferSize:sendBufSize:semaIndex:readSemaIndex:writeSemaIndex:
> >>and their associated close calls and see what's being opened without being
> >>closed. It shoudl be easy to track=down, but may be more difficult to
> >>fix.
> >>
> >>Good luck!
>
> Aha. Soo... am I understanding this correctly - we are probably leaking
> fds and when we go above 1024 this makes select() go bonkers and
> eventually leads to the "Bad file descriptor" error?
>
> >I agree with what Eliot is saying and would add a few thoughts:
>
> >- Don't fix the wrong problem (DFtWP). Unless you have some reason to
> >believe that this server application would realistically have a need to
> >handle anything close to a thousand concurrent TCP sessions, don't fix
> >it by raising the per-process file handle limit, and don't fix it by
> >reimplementing the socket listening code.
>
> We haven't done the exact numbers, but we could probably hit several
> hundreds concurrent at least. 1024 seemed a bit "over the top" though :)
>
> The system in question is meant to serve more than 1000 concurrent
> users, so we are in fact moving into this territory. We have been up to
> around 600 so far.
>
> >- It is entirely possible that no one before you has ever tried to run
> >a server application with the per-process file handle limit bumped up
> >above the default 1024. So if that configuration does not play nicely
> >with the select() mechanism, you may well be the first to have encountered
> >this as an issue. But see above, don't fix it if it ain't broke.
>
> Well, it most probably *is* broke - I mean - I haven't read anywhere
> that our Socket code is limited to 1024 concurrent sockets and that
> going above that limit causes the Socket code to stop working? :)
>
> But I agree - I don't want to touch that code if we can simply avoid
> this bug by making sure we stay below 1024.
>
> But it sounds broke to me, nevertheless. ;)
Indeed it probably is.
>
> >- Most "out of file descriptor" problems involve resource leaks (as Eliot
> >is suggesting), and in those cases you will see a gradual increase in file
> >descriptors in /proc/<vmpid>/fd/ over time. Eventually you run out of
> >descriptors and something horrible happens.
>
> We will start looking at that and other tools too.
>
>
> >- Sorry to repeat myself but this is by far the most important point:
> >DFtWP.
>
> Sure :). This is why I posted - to get your input. And I have a
> suspicion that the SAML issue I mentioned may be the code leaking, we
> will start looking.
>
> regards, G?ran
Cool. Please keep us posted on progress. In the unlikely event that free
advice turns out to be worth anything, there is plenty more where that
came from ;-)
Dave
More information about the Vm-dev
mailing list