[Vm-dev] Squeak socket problem ... help!

David T. Lewis lewis at mail.msen.com
Fri Oct 10 00:21:40 UTC 2014


On Fri, Oct 10, 2014 at 02:09:08AM +0200, G?ran Krampe wrote:
> 
> Hi guys!
> 
> On 10/10/2014 01:28 AM, David T. Lewis wrote:
> >>>Ron and I (3DICC) have a problem with the Unix VM networking and I am
> >>>reaching out before burning too many hours on something one of you
> >>>C-Unix/Socket/VM guys can fix in an afternoon - and earn a buck for your
> >>>trouble.
> >>
> >>Cool.  This is likely easy to fix.  Your image is running out of file
> >>descriptors.  Track open and close calls, e.g. add logging around at
> >>least StandardFileStream>>#primOpen:writable:
> >>, AsyncFile>>#primOpen:forWrite:semaIndex:,
> >>Socket>>#primAcceptFrom:receiveBufferSize:sendBufSize:semaIndex:readSemaIndex:writeSemaIndex:
> >>and their associated close calls and see what's being opened without being
> >>closed.  It shoudl be easy to track=down, but may be more difficult to 
> >>fix.
> >>
> >>Good luck!
> 
> Aha. Soo... am I understanding this correctly - we are probably leaking 
> fds and when we go above 1024 this makes select() go bonkers and 
> eventually leads to the "Bad file descriptor" error?
> 
> >I agree with what Eliot is saying and would add a few thoughts:
> 
> >- Don't fix the wrong problem (DFtWP). Unless you have some reason to
> >believe that this server application would realistically have a need to
> >handle anything close to a thousand concurrent TCP sessions, don't fix
> >it by raising the per-process file handle limit, and don't fix it by
> >reimplementing the socket listening code.
> 
> We haven't done the exact numbers, but we could probably hit several 
> hundreds concurrent at least. 1024 seemed a bit "over the top" though :)
> 
> The system in question is meant to serve more than 1000 concurrent 
> users, so we are in fact moving into this territory. We have been up to 
> around 600 so far.
> 
> >- It is entirely possible that no one before you has ever tried to run
> >a server application with the per-process file handle limit bumped up
> >above the default 1024. So if that configuration does not play nicely
> >with the select() mechanism, you may well be the first to have encountered
> >this as an issue. But see above, don't fix it if it ain't broke.
> 
> Well, it most probably *is* broke - I mean - I haven't read anywhere 
> that our Socket code is limited to 1024 concurrent sockets and that 
> going above that limit causes the Socket code to stop working? :)
> 
> But I agree - I don't want to touch that code if we can simply avoid 
> this bug by making sure we stay below 1024.
> 
> But it sounds broke to me, nevertheless. ;)

Indeed it probably is.

> 
> >- Most "out of file descriptor" problems involve resource leaks (as Eliot
> >is suggesting), and in those cases you will see a gradual increase in file
> >descriptors in /proc/<vmpid>/fd/ over time. Eventually you run out of
> >descriptors and something horrible happens.
> 
> We will start looking at that and other tools too.
> 
> 
> >- Sorry to repeat myself but this is by far the most important point: 
> >DFtWP.
> 
> Sure :). This is why I posted - to get your input. And I have a 
> suspicion that the SAML issue I mentioned may be the code leaking, we 
> will start looking.
> 
> regards, G?ran

Cool. Please keep us posted on progress. In the unlikely event that free
advice turns out to be worth anything, there is plenty more where that
came from ;-)

Dave



More information about the Vm-dev mailing list