[Linux][VM][3.2-5] Socket problem 'dataHandler: selected socket would block (why?)'

Stephan Rudlof sr at evolgo.de
Thu Oct 24 14:35:16 UTC 2002


Dear Ian,

thank you very much for your hint!
Without it I'd have had to do nasty work around this bug(s)...

Ian Piumarta wrote:
> Hi Stephan,
> 
> On Wed, 23 Oct 2002, Stephan Rudlof wrote:
> 
>>I have a difficult socket problem with a client/server app: two Squeak
>>instances are communicating over a bidirectional socket, which is used
>>permanently.
>>
>>The error arises deeply in the system in
>>  Squeak-3.2-5/platforms/unix/plugins/SocketPlugin/sqUnixSocket.c
>>after some time transferring data back and forth (hundreds to thousands
>>sends/receives in both directions) and results in an interrupted connection.
>>
>>It seems to be that the implementor of the socket code 
> 
> 
> That would be me.
> 
> 

>>wonders himself/herself about the err msg
>>  'dataHandler: selected socket would block (why?)'
> 
> 
> The above indicates that:
> 
> 0.  Squeak polled for i/o by calling the aio polling function
> 1.  select() subsequently returned _without_ an error condition
> 2.  a bit was set in the `readable' fdset corresponding to some socket
> 3.  the socket's dataHandler was called to read from the socket
> 4.  the dataHandler tried to read () but was told `EWOULDBLOCK'
> 5.  the dataHandler is now very confused and asks itself `why?'
> 
> If I remember correctly the dataHandler then proceeds to assume there's a
> problem (and sets it to other-end-closed) because otherwise Smalltalk code
> could block forever waiting for something to happen on a damaged socket.

In contrast to me you obviously have an idea about what happens there.

> 
> 
>>Has anybody seen this error?
>>Has anybody an idea how to hunt the bug further?
> 
> 

> Try running your VM with `-notimer' and tell me (either way) if that
> helps.

Your hint solves this socket problem (AFAICS)!

But I assume that using the -notimer option has some drawbacks
(performance?), otherwise it would be default, I guess.

> There are a few potential EINTR points in the socket code
> (including one in the test for an established connection reached
> indirectly from the dataHandler) which maybe should be more intelligent
> about how they cope with that condition.
> 
> How much hassle is it for you to:
> 
> a) reproduce this problem?

It just takes some time, since the application (running at the two Squeak
instances) has to run for some time to get the error (between a few hundreds
to thousands sends of data back and forth). A nondeterministic problem...

> b) recompile your VM from source?

No problem (btw: FFI in there by default now eases compilation of a new dist
for me).

> 
> If the answers are both `fairly easy'

They are.

> then I could send you some patches
> during the course of the afternoon to see if we can eliminate this.  (I'd
> get on the case myself except that I'm presenting a seminar tomorrow and
> have to finish my presentation slides -- but a little `asynchronous remote
> debugging consultancy' would be a welcome distraction. ;)

Though running Squeak with -notimer seems to be OK for me, I'm willingly to
help debugging the other variant by testing patches; just mail me modified
sqUnixSocket.c sources (please be a little patient, if test results won't
come back immediately). Please tell me, if staying with Squeak-3.2-5 (my
current source tree) is OK for these tests.


Greetings,

Stephan

> 
> Regards,
> 
> Ian
> 
> 
> 
> 
> 


-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3




More information about the Squeak-dev mailing list