[squeak-dev] Re: [Vm-dev] I would be extremely grateful for a reproducible case for the following Socket issue

David T. Lewis lewis at mail.msen.com
Wed Mar 30 00:09:37 UTC 2016


On Mon, Mar 28, 2016 at 03:46:22PM -0700, Eliot Miranda wrote:
> Hi Levente,
> 
> On Thu, Mar 24, 2016 at 2:16 AM, Levente Uzonyi <leves at caesar.elte.hu>
> wrote:
> 
> > Hi Eliot,
> >
> > The snippet below, evaluated from a workspace, triggered the issue in less
> > than a minute for me, three times in a row.
> > Both processes will halt if #sloppyWaitForDataIfClosed: doesn't return
> > within a second. If you send #dataAvailable to the socket, you'll find that
> > it has data ready to be read, but its readSemaphore has no signal.
> >
> > Levente
> >
> 
> Thanks very much.  In Cog the bug was that the sqAtomicOps.h implementation
> for atomic increment had been changed to use an intrinsic that operated on
> 32-bit or 64-bit quantities, but not on the 16-bit quantities that
> sqExternalSemaphores.c used to coordinate the multiple writers/single
> reader signalSemaphoreWithIndex scheme.  There may still be occasions in
> which signals are lost, but at least the below runs for several minutes
> without a problem.  Dave, it would be great to know why it fails in the
> Interpreter VM on unix.  The platform subsystem is quite different, so its
> odd that we see a similar symptom.
> 


Hi Eliot,

I set up Levente's test with an interpreter VM, and I'll see if I can
get any further insight. On my first test, it ran for maybe 5 or 10
minutes before encountering a halt. A second test ran for about 10
minutes or so, and now seems to have a locked up my image, but with the
VM running normally according to /proc/<pid>/status. I'm not sure what
to make of that, but I'll try it a few more times and see if there is
any repeatable pattern. The image that I am using for this is a 4.6
image updated to more or less current trunk level, so I'm not sure how
stable it is.

Dave





> Socket compile: 'sloppyWaitForDataIfClosed: closedBlock
> >
> >         [(socketHandle ~~ nil
> >           and: [self primSocketReceiveDataAvailable: socketHandle]) ifTrue:
> >                 [^self].
> >          self isConnected ifFalse:
> >                 [^closedBlock value].
> >          self readSemaphore wait] repeat'
> > classified: 'waiting'.
> >
> > [
> >         listenerSocket := Socket newTCP.
> >         listenerSocket listenOn: 0 backlogSize: 4 interface: #[127 0 0 1].
> >         clientSocket := Socket newTCP.
> >         clientSocket connectTo: #[127 0 0 1] port: listenerSocket
> > localPort.
> >         clientSocket waitForConnectionFor: 1.
> >         self assert: clientSocket isConnected.
> >         serverSocket := listenerSocket waitForAcceptFor: 1.
> >         self assert: serverSocket isConnected ]
> >         ensure: [ listenerSocket destroy ].
> >
> > serverProcess := [
> >         | shouldRun buffer bytesReceived waitDuration |
> >         shouldRun := true.
> >         buffer := ByteString new: 10.
> >         waitDuration := 1 second.
> >         [
> >                 [ serverSocket sloppyWaitForDataIfClosed: [ shouldRun :=
> > false ] ]
> >                         valueWithin: waitDuration
> >                         onTimeout: [ self halt ].
> >                 buffer atAllPut: (Character value: 0).
> >                 bytesReceived := serverSocket receiveDataInto: buffer.
> >                 self assert: bytesReceived = 4.
> >                 self assert: (buffer first: 4) = 'PING'.
> >                 serverSocket sendData: 'PONG' ] repeat ] newProcess.
> > clientProcess := [
> >         | shouldRun buffer bytesReceived waitDuration |
> >         shouldRun := true.
> >         buffer := ByteString new: 10.
> >         waitDuration := 1 second.
> >         [
> >                 clientSocket sendData: 'PING'.
> >                 [ clientSocket sloppyWaitForDataIfClosed: [ shouldRun :=
> > false ] ]
> >                         valueWithin: waitDuration
> >                         onTimeout: [ self halt ].
> >                 buffer atAllPut: (Character value: 0).
> >                 bytesReceived := clientSocket receiveDataInto: buffer.
> >                 self assert: bytesReceived = 4.
> >                 self assert: (buffer first: 4) = 'PONG' ] repeat ]
> > newProcess.
> > clientProcess priority: 39; resume.
> > serverProcess priority: 39; resume.
> >
> > "Evaluate these after debugging:
> > clientSocket destroy.
> > serverSocket destroy."
> >
> >
> >
> > On Wed, 23 Mar 2016, Eliot Miranda wrote:
> >
> > Hi Levente,
> >> On Wed, Mar 23, 2016 at 11:31 AM, Levente Uzonyi <leves at caesar.elte.hu>
> >> wrote:
> >>       Hi Eliot,
> >>
> >>       What sort of reproducibility are you looking for? Is it enough if
> >> it happens once every few hours or do you need something that you can
> >> trigger on demand?
> >>
> >>
> >> I'll take every few hours, but I'd prefer "in under 30 minutes".  Getting
> >> warm and fuzzy feelings when trying to prove a negative with something that
> >> takes hours to run is very difficult.  Let's say you have
> >> a case which reproduces in 8 hours 50% of the time.  To reach 99%
> >> confidence level in a fix I'd have to run it for 8 * (50 log: 2) hours
> >> without seeing it reproduce, right?  That's nearly 2 days; it could
> >> take weeks to fix :-(
> >>
> >>       Levente
> >>
> >>
> >>
> >> _,,,^..^,,,_
> >> best, Eliot
> >>
> >>
> >
> >
> >
> 
> 
> -- 
> _,,,^..^,,,_
> best, Eliot


More information about the Vm-dev mailing list