[squeak-dev] Re: [Vm-dev] I would be extremely grateful for a
reproducible case for the following Socket issue
David T. Lewis
lewis at mail.msen.com
Wed Mar 30 00:09:37 UTC 2016
On Mon, Mar 28, 2016 at 03:46:22PM -0700, Eliot Miranda wrote:
> Hi Levente,
>
> On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi <leves at caesar.elte.hu>
> wrote:
>
> > Hi Eliot,
> >
> > The snippet below, evaluated from a workspace, triggered the issue in less
> > than a minute for me, three times in a row.
> > Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return
> > within a second. If you send #dataAvailable to the socket, you'll find that
> > it has data ready to be read, but its readSemaphore has no signal.
> >
> > Levente
> >
>
> Thanks very much. In Cog the bug was that the sqAtomicOps.h implementation
> for atomic increment had been changed to use an intrinsic that operated on
> 32-bit or 64-bit quantities, but not on the 16-bit quantities that
> sqExternalSemaphores.c used to coordinate the multiple writers/single
> reader signalSemaphoreWithIndex scheme. There may still be occasions in
> which signals are lost, but at least the below runs for several minutes
> without a problem. Dave, it would be great to know why it fails in the
> Interpreter VM on unix. The platform subsystem is quite different, so its
> odd that we see a similar symptom.
>
Hi Eliot,
I set up Levente's test with an interpreter VM, and I'll see if I can
get any further insight. On my first test, it ran for maybe 5 or 10
minutes before encountering a halt. A second test ran for about 10
minutes or so, and now seems to have a locked up my image, but with the
VM running normally according to /proc/<pid>/status. I'm not sure what
to make of that, but I'll try it a few more times and see if there is
any repeatable pattern. The image that I am using for this is a 4.6
image updated to more or less current trunk level, so I'm not sure how
stable it is.
Dave
> Socket compile: 'sloppyWaitForDataIfClosed: closedBlock
> >
> > [(socketHandle ~~ nil
> > and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
> > [^self].
> > self isConnected ifFalse:
> > [^closedBlock value].
> > self readSemaphore wait] repeat'
> > classified: 'waiting'.
> >
> > [
> > listenerSocket := Socket newTCP.
> > listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1].
> > clientSocket := Socket newTCP.
> > clientSocket connectTo: #[127 0 0 1] port: listenerSocket
> > localPort.
> > clientSocket waitForConnectionFor: 1.
> > self assert: clientSocket isConnected.
> > serverSocket := listenerSocket waitForAcceptFor: 1.
> > self assert: serverSocket isConnected ]
> > ensure: [ listenerSocket destroy ].
> >
> > serverProcess := [
> > | shouldRun buffer bytesReceived waitDuration |
> > shouldRun := true.
> > buffer := ByteString new: 10.
> > waitDuration := 1 second.
> > [
> > [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun :=
> > false ] ]
> > valueWithin: waitDuration
> > onTimeout: [ self halt ].
> > buffer atAllPut: (Character value: 0).
> > bytesReceived := serverSocket receiveDataInto: buffer.
> > self assert: bytesReceived = 4.
> > self assert: (buffer first: 4) = 'PING'.
> > serverSocket sendData: 'PONG' ] repeat ] newProcess.
> > clientProcess := [
> > | shouldRun buffer bytesReceived waitDuration |
> > shouldRun := true.
> > buffer := ByteString new: 10.
> > waitDuration := 1 second.
> > [
> > clientSocket sendData: 'PING'.
> > [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun :=
> > false ] ]
> > valueWithin: waitDuration
> > onTimeout: [ self halt ].
> > buffer atAllPut: (Character value: 0).
> > bytesReceived := clientSocket receiveDataInto: buffer.
> > self assert: bytesReceived = 4.
> > self assert: (buffer first: 4) = 'PONG' ] repeat ]
> > newProcess.
> > clientProcess priority: 39; resume.
> > serverProcess priority: 39; resume.
> >
> > "Evaluate these after debugging:
> > clientSocket destroy.
> > serverSocket destroy."
> >
> >
> >
> > On Wed, 23 Mar 2016, Eliot Miranda wrote:
> >
> > Hi Levente,
> >> On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi <leves at caesar.elte.hu>
> >> wrote:
> >> Hi Eliot,
> >>
> >> What sort of reproducibility are you looking for? Is it enough if
> >> it happens once every few hours or do you need something that you can
> >> trigger on demand?
> >>
> >>
> >> I'll take every few hours, but I'd prefer "in under 30 minutes". Getting
> >> warm and fuzzy feelings when trying to prove a negative with something that
> >> takes hours to run is very difficult. Let's say you have
> >> a case which reproduces in 8 hours 50% of the time. To reach 99%
> >> confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours
> >> without seeing it reproduce, right? That's nearly 2 days; it could
> >> take weeks to fix :-(
> >>
> >> Levente
> >>
> >>
> >>
> >> _,,,^..^,,,_
> >> best, Eliot
> >>
> >>
> >
> >
> >
>
>
> --
> _,,,^..^,,,_
> best, Eliot
More information about the Vm-dev
mailing list