[squeak-dev] Re: [Vm-dev] I would be extremely grateful for a reproducible case for the following Socket issue

Eliot Miranda eliot.miranda at gmail.com
Thu Mar 24 23:35:40 UTC 2016


Hi Levente,

On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi <leves at caesar.elte.hu>
wrote:

> Hi Eliot,
>
> The snippet below, evaluated from a workspace, triggered the issue in less
> than a minute for me, three times in a row.
> Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return
> within a second. If you send #dataAvailable to the socket, you'll find that
> it has data ready to be read, but its readSemaphore has no signal.
>
> Levente
>
>
> Socket compile: 'sloppyWaitForDataIfClosed: closedBlock
>
>         [(socketHandle ~~ nil
>           and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
>                 [^self].
>          self isConnected ifFalse:
>                 [^closedBlock value].
>          self readSemaphore wait] repeat'
> classified: 'waiting'.
>
> [
>         listenerSocket := Socket newTCP.
>         listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1].
>         clientSocket := Socket newTCP.
>         clientSocket connectTo: #[127 0 0 1] port: listenerSocket
> localPort.
>         clientSocket waitForConnectionFor: 1.
>         self assert: clientSocket isConnected.
>         serverSocket := listenerSocket waitForAcceptFor: 1.
>         self assert: serverSocket isConnected ]
>         ensure: [ listenerSocket destroy ].
>
> serverProcess := [
>         | shouldRun buffer bytesReceived waitDuration |
>         shouldRun := true.
>         buffer := ByteString new: 10.
>         waitDuration := 1 second.
>         [
>                 [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun :=
> false ] ]
>                         valueWithin: waitDuration
>                         onTimeout: [ self halt ].
>                 buffer atAllPut: (Character value: 0).
>                 bytesReceived := serverSocket receiveDataInto: buffer.
>                 self assert: bytesReceived = 4.
>                 self assert: (buffer first: 4) = 'PING'.
>                 serverSocket sendData: 'PONG' ] repeat ] newProcess.
> clientProcess := [
>         | shouldRun buffer bytesReceived waitDuration |
>         shouldRun := true.
>         buffer := ByteString new: 10.
>         waitDuration := 1 second.
>         [
>                 clientSocket sendData: 'PING'.
>                 [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun :=
> false ] ]
>                         valueWithin: waitDuration
>                         onTimeout: [ self halt ].
>                 buffer atAllPut: (Character value: 0).
>                 bytesReceived := clientSocket receiveDataInto: buffer.
>                 self assert: bytesReceived = 4.
>                 self assert: (buffer first: 4) = 'PONG' ] repeat ]
> newProcess.
> clientProcess priority: 39; resume.
> serverProcess priority: 39; resume.
>
> "Evaluate these after debugging:
> clientSocket destroy.
> serverSocket destroy."


Fabulous, thank you!  replace the self halts with e.g. self assert:
(clientSocket dataAvailable = (clientSocket readSemaphore excessSignals >
0)), and we even have a test.  I have work to do tomorrow but hope I should
be able to debug this soon.  I'll add kqueue and epoll support when I fix
it.

(Stephan, interesting suggestion to throw hardware at the problem, thank
you).

On Wed, 23 Mar 2016, Eliot Miranda wrote:
>
> Hi Levente,
>> On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi <leves at caesar.elte.hu>
>> wrote:
>>       Hi Eliot,
>>
>>       What sort of reproducibility are you looking for? Is it enough if
>> it happens once every few hours or do you need something that you can
>> trigger on demand?
>>
>>
>> I'll take every few hours, but I'd prefer "in under 30 minutes".  Getting
>> warm and fuzzy feelings when trying to prove a negative with something that
>> takes hours to run is very difficult.  Let's say you have
>> a case which reproduces in 8 hours 50% of the time.  To reach 99%
>> confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours
>> without seeing it reproduce, right?  That's nearly 2 days; it could
>> take weeks to fix :-(
>>
>>       Levente
>>
>>
>>
>> _,,,^..^,,,_
>> best, Eliot
>>
>>
>
>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20160324/bdba3545/attachment.htm


More information about the Vm-dev mailing list