I would be extremely grateful for a reproducible case for the following Socket issue

List overview All Threads
Download

newer

older

New Cog VMs available

VM Maker:...

Eliot Miranda

23 Mar 2016 23 Mar '16

7:26 p.m.

http://forum.world.st/Socket-s-readSemaphore-is-losing-signals-with-Cog-on-L...

On 8/13/2011 13:42, Levente Uzonyi wrote:

...

Socket's readSemaphore is losing signals with CogVMs on linux. We found several cases (RFB, PostgreSQL) when processes are stuck in the following method:

Socket >> waitForDataIfClosed: closedBlock "Wait indefinitely for data to arrive. This method will block until data is available or the socket is closed."
[
    (self primSocketReceiveDataAvailable: socketHandle)
        ifTrue: [^self].
    self isConnected
        ifFalse: [^closedBlock value].
    self readSemaphore wait ] repeat
When we inspect the contexts, the process is waiting for the readSemaphore, but evaluating (self primSocketReceiveDataAvailable: socketHandle) yields true. Signaling the readSemaphore makes the process running again. As a workaround we replaced #wait with #waitTimeoutMSecs: and all our problems disappeared.

The interpreter VM doesn't seem to have this bug, so I guess the bug was introduced with the changes of aio.c.

« [hide part of quote]

Oh, interesting. We know this problem fairly well and have always worked around by changing the wait in the above to a "waitTimeoutMSecs: 500" which turns it into a soft busy loop. It would be interesting to see if there's a bug in Cog which causes this. FWIW, here is the relevant portion:

"Soft 500ms busy loop - to protect against AIO probs; occasionally, VM-level AIO fails to trip the semaphore" self readSemaphore waitTimeoutMSecs: 500.

Cheers, - Andreas _,,,^..^,,,_ best, Eliot

Attachments:

attachment.html (text/html — 2.6 KB)

Show replies by date

Levente Uzonyi

23 Mar 23 Mar

7:31 p.m.

New subject: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Eliot,

What sort of reproducibility are you looking for? Is it enough if it happens once every few hours or do you need something that you can trigger on demand?

Levente

Eliot Miranda

7:45 p.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Levente,

On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

...

Hi Eliot,

What sort of reproducibility are you looking for? Is it enough if it happens once every few hours or do you need something that you can trigger on demand?

I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(

Levente

...

_,,,^..^,,,_ best, Eliot

Levente Uzonyi

24 Mar 24 Mar

10:16 a.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

Levente

Socket compile: 'sloppyWaitForDataIfClosed: closedBlock

[(socketHandle ~~ nil and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue: [^self]. self isConnected ifFalse: [^closedBlock value]. self readSemaphore wait] repeat' classified: 'waiting'.

[ listenerSocket := Socket newTCP. listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1]. clientSocket := Socket newTCP. clientSocket connectTo: #[127 0 0 1] port: listenerSocket localPort. clientSocket waitForConnectionFor: 1. self assert: clientSocket isConnected. serverSocket := listenerSocket waitForAcceptFor: 1. self assert: serverSocket isConnected ] ensure: [ listenerSocket destroy ].

serverProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := serverSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PING'. serverSocket sendData: 'PONG' ] repeat ] newProcess. clientProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ clientSocket sendData: 'PING'. [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := clientSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PONG' ] repeat ] newProcess. clientProcess priority: 39; resume. serverProcess priority: 39; resume.

"Evaluate these after debugging: clientSocket destroy. serverSocket destroy."

On Wed, 23 Mar 2016, Eliot Miranda wrote:

...

Hi Levente, On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
  What sort of reproducibility are you looking for? Is it enough if it happens once every few hours or do you need something that you can trigger on demand?
I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(
  Levente
_,,,^..^,,,_ best, Eliot

Holger Freyther

4:51 p.m.

New subject: [squeak-dev] I would be extremely grateful for a reproducible case for the following Socket issue

...

On 24 Mar 2016, at 10:16, Levente Uzonyi leves@caesar.elte.hu wrote:

Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

does it happen if you have strace -o output.txt -f on the VM as well?

kind regards holger

Levente Uzonyi

25 Mar 25 Mar

4:29 p.m.

New subject: [squeak-dev] I would be extremely grateful for a reproducible case for the following Socket issue

Hi Holger,

The snippet doesn't seem to trigger the issue when strace is attached to the process. But it usually runs into the issue short after I stop strace.

Levente

On Thu, 24 Mar 2016, Holger Freyther wrote:

...

...
On 24 Mar 2016, at 10:16, Levente Uzonyi leves@caesar.elte.hu wrote:

Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

does it happen if you have strace -o output.txt -f on the VM as well?

kind regards holger

Eliot Miranda

12:35 a.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Levente,

On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

...

Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

Levente

Socket compile: 'sloppyWaitForDataIfClosed: closedBlock
    [(socketHandle ~~ nil
      and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
            [^self].
     self isConnected ifFalse:
            [^closedBlock value].
     self readSemaphore wait] repeat'
classified: 'waiting'.

[ listenerSocket := Socket newTCP. listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1]. clientSocket := Socket newTCP. clientSocket connectTo: #[127 0 0 1] port: listenerSocket localPort. clientSocket waitForConnectionFor: 1. self assert: clientSocket isConnected. serverSocket := listenerSocket waitForAcceptFor: 1. self assert: serverSocket isConnected ] ensure: [ listenerSocket destroy ].

serverProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := serverSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PING'. serverSocket sendData: 'PONG' ] repeat ] newProcess. clientProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ clientSocket sendData: 'PING'. [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := clientSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PONG' ] repeat ] newProcess. clientProcess priority: 39; resume. serverProcess priority: 39; resume.

"Evaluate these after debugging: clientSocket destroy. serverSocket destroy."

Fabulous, thank you! replace the self halts with e.g. self assert: (clientSocket dataAvailable = (clientSocket readSemaphore excessSignals > 0)), and we even have a test. I have work to do tomorrow but hope I should be able to debug this soon. I'll add kqueue and epoll support when I fix it.

(Stephan, interesting suggestion to throw hardware at the problem, thank you).

On Wed, 23 Mar 2016, Eliot Miranda wrote:

...

Hi Levente,

...
On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
  What sort of reproducibility are you looking for? Is it enough if
it happens once every few hours or do you need something that you can trigger on demand?

I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(
  Levente
_,,,^..^,,,_ best, Eliot

-- _,,,^..^,,,_ best, Eliot

Holger Freyther

7:42 a.m.

New subject: [squeak-dev] I would be extremely grateful for a reproducible case for the following Socket issue

...

On 25 Mar 2016, at 00:35, Eliot Miranda eliot.miranda@gmail.com wrote:

Dear Eliot,

...

Fabulous, thank you! replace the self halts with e.g. self assert: (clientSocket dataAvailable = (clientSocket readSemaphore excessSignals > 0)), and we even have a test. I have work to do tomorrow but hope I should be able to debug this soon. I'll add kqueue and epoll support when I fix it.

before going to kqueue, could you have a look at my hack to create a completely idle VM? Once the VM has moved to kqueue it seems one can not get a SIGIO for the "kqueue descriptor"[1] itself and I don't know if SIGIO remains working for the individual file descriptors.

holger

[1] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c#L1735

Eliot Miranda

4:57 p.m.

New subject: [squeak-dev] I would be extremely grateful for a reproducible case for the following Socket issue

Hi Holger,

let's collaborate on this. You and I and lots of others want this to be done. But in fixing the below bug my priorities are to a) deliver a fix promptly b) write the code so that either kqueue or epoll or select will be used based on HAVE_KQUEUE & HAVE_EPOLL defines

So if I deliver a fix don't think that I'm against having a proper event-driven or idling VM, I just don't have it in my head yet and want the bug fixed and kqueue/epoll available asap.

On Thu, Mar 24, 2016 at 11:42 PM, Holger Freyther holger@freyther.de wrote:

...

...
On 25 Mar 2016, at 00:35, Eliot Miranda eliot.miranda@gmail.com wrote:

Dear Eliot,

...
Fabulous, thank you! replace the self halts with e.g. self assert:

(clientSocket dataAvailable = (clientSocket readSemaphore excessSignals > 0)), and we even have a test. I have work to do tomorrow but hope I should be able to debug this soon. I'll add kqueue and epoll support when I fix it.

before going to kqueue, could you have a look at my hack to create a completely idle VM? Once the VM has moved to kqueue it seems one can not get a SIGIO for the "kqueue descriptor"[1] itself and I don't know if SIGIO remains working for the individual file descriptors.

holger

[1] https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_event.c#L1735

-- _,,,^..^,,,_ best, Eliot

Eliot Miranda

26 Mar 26 Mar

3:03 a.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Levente,

the first thing to report is that this isn't a Cog-specific bug. It also reproduces using the trunk VM sources (4.15.3) and a freshly built linux VM with a 4.6 image.

On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

...

Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

Levente

Socket compile: 'sloppyWaitForDataIfClosed: closedBlock
    [(socketHandle ~~ nil
      and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
            [^self].
     self isConnected ifFalse:
            [^closedBlock value].
     self readSemaphore wait] repeat'
classified: 'waiting'.

[ listenerSocket := Socket newTCP. listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1]. clientSocket := Socket newTCP. clientSocket connectTo: #[127 0 0 1] port: listenerSocket localPort. clientSocket waitForConnectionFor: 1. self assert: clientSocket isConnected. serverSocket := listenerSocket waitForAcceptFor: 1. self assert: serverSocket isConnected ] ensure: [ listenerSocket destroy ].

serverProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := serverSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PING'. serverSocket sendData: 'PONG' ] repeat ] newProcess. clientProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ clientSocket sendData: 'PING'. [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := clientSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PONG' ] repeat ] newProcess. clientProcess priority: 39; resume. serverProcess priority: 39; resume.

"Evaluate these after debugging: clientSocket destroy. serverSocket destroy."

On Wed, 23 Mar 2016, Eliot Miranda wrote:

Hi Levente,

...
On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
  What sort of reproducibility are you looking for? Is it enough if
it happens once every few hours or do you need something that you can trigger on demand?

I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(
  Levente
_,,,^..^,,,_ best, Eliot

-- _,,,^..^,,,_ best, Eliot

Levente Uzonyi

8:33 p.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Eliot,

AFAIK the workaround came from the Qwaq project, and it had been there long before it was added to the Trunk.

Levente

Eliot Miranda

29 Mar 29 Mar

12:46 a.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

Hi Levente,

On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

...

Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

Levente

Thanks very much. In Cog the bug was that the sqAtomicOps.h implementation for atomic increment had been changed to use an intrinsic that operated on 32-bit or 64-bit quantities, but not on the 16-bit quantities that sqExternalSemaphores.c used to coordinate the multiple writers/single reader signalSemaphoreWithIndex scheme. There may still be occasions in which signals are lost, but at least the below runs for several minutes without a problem. Dave, it would be great to know why it fails in the Interpreter VM on unix. The platform subsystem is quite different, so its odd that we see a similar symptom.

Socket compile: 'sloppyWaitForDataIfClosed: closedBlock

...

    [(socketHandle ~~ nil
      and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
            [^self].
     self isConnected ifFalse:
            [^closedBlock value].
     self readSemaphore wait] repeat'
classified: 'waiting'.

[ listenerSocket := Socket newTCP. listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1]. clientSocket := Socket newTCP. clientSocket connectTo: #[127 0 0 1] port: listenerSocket localPort. clientSocket waitForConnectionFor: 1. self assert: clientSocket isConnected. serverSocket := listenerSocket waitForAcceptFor: 1. self assert: serverSocket isConnected ] ensure: [ listenerSocket destroy ].

serverProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := serverSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PING'. serverSocket sendData: 'PONG' ] repeat ] newProcess. clientProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ clientSocket sendData: 'PING'. [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := clientSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PONG' ] repeat ] newProcess. clientProcess priority: 39; resume. serverProcess priority: 39; resume.

"Evaluate these after debugging: clientSocket destroy. serverSocket destroy."

On Wed, 23 Mar 2016, Eliot Miranda wrote:

Hi Levente,

...
On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
  What sort of reproducibility are you looking for? Is it enough if
it happens once every few hours or do you need something that you can trigger on demand?

I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(
  Levente
_,,,^..^,,,_ best, Eliot

-- _,,,^..^,,,_ best, Eliot

David T. Lewis

30 Mar 30 Mar

2:09 a.m.

New subject: [squeak-dev] Re: I would be extremely grateful for a reproducible case for the following Socket issue

On Mon, Mar 28, 2016 at 03:46:22PM -0700, Eliot Miranda wrote:

...

Hi Levente,

On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi leves@caesar.elte.hu wrote:

...
Hi Eliot,

The snippet below, evaluated from a workspace, triggered the issue in less than a minute for me, three times in a row. Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return within a second. If you send #dataAvailable to the socket, you'll find that it has data ready to be read, but its readSemaphore has no signal.

Levente

Thanks very much. In Cog the bug was that the sqAtomicOps.h implementation for atomic increment had been changed to use an intrinsic that operated on 32-bit or 64-bit quantities, but not on the 16-bit quantities that sqExternalSemaphores.c used to coordinate the multiple writers/single reader signalSemaphoreWithIndex scheme. There may still be occasions in which signals are lost, but at least the below runs for several minutes without a problem. Dave, it would be great to know why it fails in the Interpreter VM on unix. The platform subsystem is quite different, so its odd that we see a similar symptom.

Hi Eliot,

I set up Levente's test with an interpreter VM, and I'll see if I can get any further insight. On my first test, it ran for maybe 5 or 10 minutes before encountering a halt. A second test ran for about 10 minutes or so, and now seems to have a locked up my image, but with the VM running normally according to /proc/<pid>/status. I'm not sure what to make of that, but I'll try it a few more times and see if there is any repeatable pattern. The image that I am using for this is a 4.6 image updated to more or less current trunk level, so I'm not sure how stable it is.

Dave

...

Socket compile: 'sloppyWaitForDataIfClosed: closedBlock

...
    [(socketHandle ~~ nil
      and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
            [^self].
     self isConnected ifFalse:
            [^closedBlock value].
     self readSemaphore wait] repeat'
classified: 'waiting'.

[ listenerSocket := Socket newTCP. listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1]. clientSocket := Socket newTCP. clientSocket connectTo: #[127 0 0 1] port: listenerSocket localPort. clientSocket waitForConnectionFor: 1. self assert: clientSocket isConnected. serverSocket := listenerSocket waitForAcceptFor: 1. self assert: serverSocket isConnected ] ensure: [ listenerSocket destroy ].

serverProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := serverSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PING'. serverSocket sendData: 'PONG' ] repeat ] newProcess. clientProcess := [ | shouldRun buffer bytesReceived waitDuration | shouldRun := true. buffer := ByteString new: 10. waitDuration := 1 second. [ clientSocket sendData: 'PING'. [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun := false ] ] valueWithin: waitDuration onTimeout: [ self halt ]. buffer atAllPut: (Character value: 0). bytesReceived := clientSocket receiveDataInto: buffer. self assert: bytesReceived = 4. self assert: (buffer first: 4) = 'PONG' ] repeat ] newProcess. clientProcess priority: 39; resume. serverProcess priority: 39; resume.

"Evaluate these after debugging: clientSocket destroy. serverSocket destroy."

On Wed, 23 Mar 2016, Eliot Miranda wrote:

Hi Levente,

...
On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi leves@caesar.elte.hu wrote: Hi Eliot,
  What sort of reproducibility are you looking for? Is it enough if
it happens once every few hours or do you need something that you can trigger on demand?

I'll take every few hours, but I'd prefer "in under 30 minutes". Getting warm and fuzzy feelings when trying to prove a negative with something that takes hours to run is very difficult. Let's say you have a case which reproduces in 8 hours 50% of the time. To reach 99% confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours without seeing it reproduce, right? That's nearly 2 days; it could take weeks to fix :-(
  Levente
_,,,^..^,,,_ best, Eliot
-- _,,,^..^,,,_ best, Eliot

2970

Age (days ago)

2977

Last active (days ago)

vm-dev@lists.squeakfoundation.org

12 comments

4 participants

tags (0)

participants (4)

David T. Lewis
Eliot Miranda
Holger Freyther
Levente Uzonyi