[Linux][VM][3.2-5] Socket problem 'dataHandler: selected socket
would block (why?)'
sr at evolgo.de
Thu Oct 24 14:35:16 UTC 2002
thank you very much for your hint!
Without it I'd have had to do nasty work around this bug(s)...
Ian Piumarta wrote:
> Hi Stephan,
> On Wed, 23 Oct 2002, Stephan Rudlof wrote:
>>I have a difficult socket problem with a client/server app: two Squeak
>>instances are communicating over a bidirectional socket, which is used
>>The error arises deeply in the system in
>>after some time transferring data back and forth (hundreds to thousands
>>sends/receives in both directions) and results in an interrupted connection.
>>It seems to be that the implementor of the socket code
> That would be me.
>>wonders himself/herself about the err msg
>> 'dataHandler: selected socket would block (why?)'
> The above indicates that:
> 0. Squeak polled for i/o by calling the aio polling function
> 1. select() subsequently returned _without_ an error condition
> 2. a bit was set in the `readable' fdset corresponding to some socket
> 3. the socket's dataHandler was called to read from the socket
> 4. the dataHandler tried to read () but was told `EWOULDBLOCK'
> 5. the dataHandler is now very confused and asks itself `why?'
> If I remember correctly the dataHandler then proceeds to assume there's a
> problem (and sets it to other-end-closed) because otherwise Smalltalk code
> could block forever waiting for something to happen on a damaged socket.
In contrast to me you obviously have an idea about what happens there.
>>Has anybody seen this error?
>>Has anybody an idea how to hunt the bug further?
> Try running your VM with `-notimer' and tell me (either way) if that
Your hint solves this socket problem (AFAICS)!
But I assume that using the -notimer option has some drawbacks
(performance?), otherwise it would be default, I guess.
> There are a few potential EINTR points in the socket code
> (including one in the test for an established connection reached
> indirectly from the dataHandler) which maybe should be more intelligent
> about how they cope with that condition.
> How much hassle is it for you to:
> a) reproduce this problem?
It just takes some time, since the application (running at the two Squeak
instances) has to run for some time to get the error (between a few hundreds
to thousands sends of data back and forth). A nondeterministic problem...
> b) recompile your VM from source?
No problem (btw: FFI in there by default now eases compilation of a new dist
> If the answers are both `fairly easy'
> then I could send you some patches
> during the course of the afternoon to see if we can eliminate this. (I'd
> get on the case myself except that I'm presenting a seminar tomorrow and
> have to finish my presentation slides -- but a little `asynchronous remote
> debugging consultancy' would be a welcome distraction. ;)
Though running Squeak with -notimer seems to be OK for me, I'm willingly to
help debugging the other variant by testing patches; just mail me modified
sqUnixSocket.c sources (please be a little patient, if test results won't
come back immediately). Please tell me, if staying with Squeak-3.2-5 (my
current source tree) is OK for these tests.
Stephan Rudlof (sr at evolgo.de)
"Genius doesn't work on an assembly line basis.
You can't simply say, 'Today I will be brilliant.'"
-- Kirk, "The Ultimate Computer", stardate 4731.3
More information about the Squeak-dev