[Vm-dev] Socket's readSemaphore is losing signals with Cog on Linux

Andreas Raab andreas.raab at gmx.de
Sun Aug 14 17:58:06 UTC 2011


On 8/13/2011 13:42, Levente Uzonyi wrote:
> Socket's readSemaphore is losing signals with CogVMs on linux. We 
> found several cases (RFB, PostgreSQL) when processes are stuck in the 
> following method:
>
> Socket >> waitForDataIfClosed: closedBlock
>     "Wait indefinitely for data to arrive.  This method will block until
>     data is available or the socket is closed."
>
>     [
>         (self primSocketReceiveDataAvailable: socketHandle)
>             ifTrue: [^self].
>         self isConnected
>             ifFalse: [^closedBlock value].
>         self readSemaphore wait ] repeat
>
> When we inspect the contexts, the process is waiting for the 
> readSemaphore, but evaluating (self primSocketReceiveDataAvailable: 
> socketHandle) yields true. Signaling the readSemaphore makes the 
> process running again. As a workaround we replaced #wait with 
> #waitTimeoutMSecs: and all our problems disappeared.
>
> The interpreter VM doesn't seem to have this bug, so I guess the bug 
> was introduced with the changes of aio.c.

Oh, interesting. We know this problem fairly well and have always worked 
around by changing the wait in the above to a "waitTimeoutMSecs: 500" 
which turns it into a soft busy loop. It would be interesting to see if 
there's a bug in Cog which causes this. FWIW, here is the relevant portion:

             "Soft 500ms busy loop - to protect against AIO probs;
             occasionally, VM-level AIO fails to trip the semaphore"
             self readSemaphore waitTimeoutMSecs: 500.

Cheers,
   - Andreas



More information about the Vm-dev mailing list