"select: Bad file number"

Tue Dec 16 18:10:36 UTC 2003

It still should not happen.  If you are out of files then the accept()
primitive should fail and the rest of the sockets should continue on
happily.  There is no way that the VM should be using an invalid file
number.  Perhaps the VM is not properly handling the case where accept()
fails?

It sounds like some descriptor is in fact explicitly closed() and yet
the VM keeps trying to do something with it.  I seem to remember that
the Unix VM used to destroy the server socket whenever an accept()
failed.  If that is still the case, then maybe that code is the culprit
and the server socket itself is the "bad file number".  The VM may be
closing the socket in this situation but not updating the data structure
accordingly.

By the way, there is another general issue floating around.  If you
reach the max and accept a connection on your last available file
descriptor,  then you will have 0 file descriptors available for
actually servicing the request.  It makes sense to me to have servers
bake it a limit on the number of simultaneous connections they will
serve, so that every connection has a reasonable amount of resources to
use.  Note that sockets is set up to support this; you can quietly fail
to accept() connections, and that is different from refusing them.

-Lex

> >> I'm doing some load testing on Seaside at work and I wrote up some  
> >> code that creates a lot of connections to the server.  If I have 100  
> >> simultaneous Processes going, it seems to be ok (at least in terms of  
> >> the bug I'm about to describe).  When I tried it with 1000  
> >> simultaneous client Processes, however, after about a minute and a  
> >> half the server image starts pumping out "select: Bad file number" to  
> >> the console as quickly as its little legs will let it.
> >>
> >> Now I assume this is related to running out of sockets or something,  
> >> but does this trigger anything for anybody?  Does this represent some  
> >> problem in the VM or in the Seaside code or is it just a fact of 
> >> life?  Seems like we should be able to handle it better somehow...
> >>
> >> Both images are 3.6-g2 VM's, the client on linux and the server on  
> >> solaris.
> >>
> >> Any help appreciated,
> >>
> >> Julian
> >>
> >>