Unix VM still coughing sockets...

Ian Piumarta ian.piumarta at inria.fr
Tue May 21 21:21:02 UTC 2002


Hi Goran,

> I still end up sometimes with Processes in #waitForDataUntil: running
> wild on me and chewing up all the CPU. I attached a silly GIF to show
> you that I am not hallucinating. (the reason for so many of them is that
> the CPU watcher suspends the naughty guy and asks me what to do and then
> another one pops up - I guess it is Comanche doing it)

I've gone back and looked closely again at the implementation and
senders of Socket>>waitForDataUntil: and there is some weirdness.

I've made some (very minor) changes in the detection of remote close
(attached) which might help.  But I'd like to check once and for all
that everyone using Sockets is in total agreement about their
behaviour concerning the detection/reporting of broken connections.
(I'm pretty sure these are the cause of the loops you're
experiencing.)

Here's what I now think should be happening:

  primSocketConnectionStatus:

    contains no intelligence at all; it just returns the last
    recorded pss->sockState

  primRecvDataAvailable:

    socket state	connection	data waiting	response
    ------------	----------	------------	--------
    Connected		any		yes		answer true

    Connected		open		no		enable signal,
							answer false

    Connected		remote closed	n/a		enable signal [1],
							state := Closed, [2]
							answer false [3]

    !Connected		n/a		n/a		enable signal,
							answer false

  dataHandler (called during wait on readSema):

    connection		response
    ----------		--------
    open		signal semaphore
    remote closed	state := Closed, signal semaphore [4]

[1] is necessary because of Socket>>waitForDataUntil.  By the time we
detect the closed connection the only way to stop this method from
waiting on the read semaphore is to answer "true" regardless of
changing the socket state.  However, since waitForDataUntil is called
from many places inside a loop that looks like

	[sock isConnected | sock dataAvailable] whileTrue: [...]    [5]

it seems obvious that answering "true" is completely out of the
question.  So we have to live with the assumption that every time we
answer "false" from recvDataAvail the image is going to wait on the
semaphore, even if the socket is no longer connected.

FWIW, [2] is what was missing in the last attempt since it answered
"true" at [3] (totally unaware that lots of people were doing [5]) and
then assumed that [4] would take care of the state change.  Oops.

If this isn't quite what heavy users of Socket are hoping for then let me
know and we can modify it.

I've run a bunch of tests and all seems well (but all seemed well in
the last attempt, including SUnits, so that doesn't apparently count
for very much ;-).  The real test is to put this into your VM, run
Comanche over it and see if you still get the loopiness.

If the above still doesn't work then I think we're going to have to
"get violent" and make primSockStatus check explicitly for remote
close (which would immediately answer "false" to isConnected at [5]
and in the controlling expression of the waitForDataUntil loop) even
though (i) it implies a (nontrivial!) syscall at every status check,
(ii) it creates a race in waitForDataUntil, and (iii) the detection
can be done just as well elsewhere, assuming the image behaves
sensibly (which is maybe not such an intelligent assumption after
all).

Regards,
Ian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sqUnixSocket.c.gz
Type: application/octet-stream
Size: 9193 bytes
Desc: sqUnixSocket.c.gz
Url : http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20020521/5357d09a/sqUnixSocket.c.obj


More information about the Squeak-dev mailing list