Hi Levente,
On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi leves@caesar.elte.hu wrote:
Hi Eliot,
The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.
Levente
Socket compile: 'sloppyWaitForDataIfClosed: closedBlock
[(socketHandle ~~ nil and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue: [^self]. self isConnected ifFalse: [^closedBlock value]. self readSemaphore wait] repeat'
classified: 'waiting'.
[ listenerSocket := Socket newTCP. listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1]. clientSocket := Socket newTCP. clientSocket connectTo: #[127 0 0 1] port: listenerSocket localPort. clientSocket waitForConnectionFor: 1. self assert: clientSocket isConnected. serverSocket := listenerSocket waitForAcceptFor: 1. self assert: serverSocket isConnected ] ensure: [ listenerSocket destroy ].
serverProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := serverSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PING'. serverSocket sendData: 'PONG' ] repeat ] newProcess. clientProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ clientSocket sendData: 'PING'. [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := clientSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PONG' ] repeat ] newProcess. clientProcess priority: 39; resume. serverProcess priority: 39; resume.
"Evaluate these after debugging: clientSocket destroy. serverSocket destroy."
Fabulous, thank you! replace the self halts with e.g. self assert: (clientSocket dataAvailable = (clientSocket readSemaphore excessSignals > 0)), and we even have a test. I have work to do tomorrow but hope I should be able to debug this soon. I'll add kqueue and epoll support when I fix it.
(Stephan, interesting suggestion to throw hardware at the problem, thank you).
On Wed, 23 Mar 2016, Eliot Miranda wrote:
Hi Levente,
On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
What sort of reproducibility are you looking for? Is it enough if
it happens once every few hours or do you need something that you can trigger on demand?
I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(
Levente
_,,,^..^,,,_ best, Eliot