Will the real sqUnixSocket.c step forward?

Ian Piumarta ian.piumarta at inria.fr
Sat May 4 02:33:46 UTC 2002


On Fri, 3 May 2002, John M McIntosh wrote:

> Ok found out that Ian shipped the code with #if 1 in the accept 
> handler to Unconnect the listener socket on an accept failure.

So Goran: in your copy, change
  #if 1
in acceptHandler() to
  #if 0
before you run any Comanche tests!  (With that change, I am almost 100%
certain that the new socket code is robust.  I've been hammering hard on
Swiki and Comanche servers, and have not been able to break them yet.)

> But this exposed an issue in the Comanche ConnectionHandler>>pvtnewListenLoop
> 
> It seems the code there on the accept failure just checks to see if 
> the listener Socket>>isValid which it is, but of course now in 
> Unconnected status, versus WaitingForConnection or some other valid 
> state, so then this while loop runs really fast and locks up the 
> image.
> 
> That's a bug.

There are two issues here.

#1 is the problem you're seeing in FreeBSD (and I'm _not_ seeing in
NetBSD), which is the ECONNABORTED on the server socket.  This should not
happen.  It's not even listed as a possible error from connect(), listen()
or accept().  (The only thing I can think of is:
  - the server listens
  - the client connects
  - the client disconnects abruptly before the connect completes
  - the server tries to accept
so I'll run a test or two to this effect.  If this is the problem then
it's caused by the browser's behaviour -- and my failure to reproduce this
is possibly because I'm using a browser [Opera] that's too "well-behaved".
If this is the problem then I'll fix it directly in the support code, so
that ECONNABORTED does not abort the listening socket.

#2 is that "legitimate" (rare) errors _can_ occur during accept() (process
or system descriptor table full are the most likely) and currently the
reporting of them is broken.

> Someone could review the Comanche code and consider what if... Or is 
> that fixed now? Yes I'm working with an old version...

My copy (2002-02-20) has this:

ConnectionHandler>>pvtNewListenLoop
  listener _ newListener.
  listener isValid ifFalse: ["do non-BSD style accept"].
  [true] whileTrue:
      [client _ listener waitForAccept.
       client notNil && client connected
          ifFalse:
              [listener isValid
                  ifFalse:
                      [waitABit.
                       listener destroy.
                       listener _ newListener]]].

so the empirical evidence is that the support code should invalidate the
listener in the case of an error on the listening socket during accept.

> And I'll let Ian decide what to do with the aborting of 
> the listen socket

Changing the

    pss->sockState= Unconnected;
to
    pss->sockState= Invalid;

in acceptHandler() should cure this (at least as far as Comanche is
concerned).

> since flipping it to unconnected (does it get 
> closed?

If we're going to stick with Invalid as the state for a broken listening
socket then it has to be closed by the accept code.  (The explicit destroy
in the Comanche code is meaningless -- the support code should never
attempt to close [or in deal in any other way with] an invalid socket.)

> or is it still actually listening) doesn't really work as 
> intended...

Either: we invalidate, close and release all resources associated with the
broken listener during accept,
Or: we chose another state (Unconnected, or whatever -- so long as it
isn't Invalid), disable further connections by calling listen(sock, 0) on
the listener (setting the backlog to zero), and let the explicit destroy
in the image finish the job of tidying up.

The first has the advantage that it's compatible with what exists, is
tolerant of missing destroys, and fails the acceptFrom primitive the next
time round the loop if left undetected.  The second has the advantage of
allowing the image a chance to find out exactly what happened before
finally destroying the socket.  I've updated the Unix socket code to
implement the first for now, but I'm happy to go with the second solution
if/when somebody decides that the error code is important.

With any luck we can close the whole socket saga once and for all
now.

Ian






More information about the Squeak-dev mailing list