[squeak-dev] OSProcess endless loop
David T. Lewis
lewis at mail.msen.com
Fri Apr 15 13:44:26 UTC 2022
Hi Chris,
On Thu, Apr 14, 2022 at 09:18:21PM -0400, David T. Lewis wrote:
> Hi Chris,
>
> <ACK> This might be tricky to debug, but message received. See below.
>
> On Wed, Apr 13, 2022 at 09:36:57PM -0500, Chris Muller wrote:
> > Hi Dave,
> >
> > I've been having occasional issues with OSProcess somehow getting
> > hosed up and becoming unusable in my image. Tonight it started again,
> > and I'm rather stuck in the water at the moment, unsure how to break
> > out of it other than building a new image.
> >
> > Magma now uses OSProcess "outputOf: 'free -wb'" every few seconds to
> > get ahead of any potential OutOfMemory signals. But somehow my image
> > got into a state where the simplest uses of OSProcess lock up.
> > Whenever it happens, I see these messages in the console:
> >
> > 364147968:982663168:[] in
> > AioEventHandler>>initializeForExceptions:readEvents:writeEvents::aio
> > event forwarding not supported
> > 364147968:37728768:[] in
> > AioEventHandler>>initializeForExceptions:readEvents:writeEvents::aio
> > event forwarding not supported
> >
> > From my feeble debugging, it seems #primRead:into:startingAt:count: is
> > returning a 0 count, which is what leads to OSProcess's
> > logic-flow to never be able to break out of the while loop in
> > BufferedAsyncFileReadStream>>#upToEndOfFile. Here's a rough stack
> > trace of that loop:
> >
> > BufferedAsyncFileReadStream>>#upToEndOfFile
> > BufferedAsyncFileReadStream>>#atEndOfFile
> > (readBuffer atEnd = true, OSProcess accessor isAtEndOfFile:
> > fileID returns false)
> > BufferedAsyncFileReadStream>>#readAvailableDataFrom:into:
> > primRead:into:startingAt:count: (---> answers 0)
> > OSProcessAccessor>>#isAtEndOfFile: (---> answers false)
> > (restart loop in BufferedAsyncFileReadStream>>#upToEndOfFile)
> >
> > It would be nice if OSProcess could detect this situation and signal
> > some kind of error. With the endless loop, it sometimes takes a while
> > to get to the bottom of why something isn't responsive.
> >
> > I'm running production 5.3 with the latest OSProcess and CommandShell.
> > I thought it might be a resource issue on my laptop, but rebooting
> > didn't help. Rebuilding from fresh 5.3 image always works, however,
> > the weirdest thing is, the problem seems to clear ITSELF up. Like,
> > OMG, right now, it's working again! I had just run a test in a fresh
> > image to test multiple processes hitting OSProcess outputOf:. It
> > worked fine and when I came back to my problem image, it's suddenly
> > working again!
> >
> > Can you think of anything I might be doing to get into this situation
> > and/or how to break out of it? Something to avoid or initialize?
> >
> > The other thing I noticed, when I would break into the locked up
> > OSProcess with Cmd+. (dot), there were TWO processes stuck
> > in the loop, one from my DoIt, the other originating from the line:
> >
> > "self changed: #childProcessStatus"
> >
> > of #grimReaperProcess. Sigh.. I apparently already closed those
> > debuggers and now can't reproduce the issue to paste their bug report
> > stack traces! Sorry.
> >
> > I hope I'm not the only one experiencing this issue so we can
> > hopefully track this down. It's insidious when it happens.
> >
> > Thanks,
> > Chris
> >
>
> A bit off the top of my head and totally untested, but the first thing
> that comes to mind is that the check for #isAtEndOfFile is calling
> <primitive: 'primitiveTestEndOfFileFlag' module: 'UnixOSProcessPlugin'>
> which is probably going to see an end of file on the pipe only if the
> external child OSProcess has actually terminated normally. Terminating
> normally means not just that the child process (running your 'free -wb'
> command in a bash shell) has exited, but also that the parent process
> (the Squeak VM) has noticed it and has harvested the child exit status.
>
> There is a Smalltalk process ('the child OSProcess watcher' in your process
> browser) that is responsible for harvesting the child process to get its
> exit status. This is event driven and can sometimes be a bit fragile. So
> the first thing that comes to mind if you have an image stuck in the
> condition that you describe would be to give that process a kick and the
> pants and see if things get better.
>
> If you can catch one of your images in a wedged condition, here are some
> do-its that might possibly break it free. If either of these work, it
> will give us an idea of where the underlying problem might lie.
>
> OSProcess thisOSProcess updateActiveChildren.
>
> OSProcess accessor restartChildWatcherProcess.
>
> My guess is that the #updateActiveChildren method is most likely to
> un-wedge things but this is definitely just a guess.
>
> Dave
>
>
A couple of other things you might check:
1) There is a registry of current and past child processes:
"OSProcess thisOSProcess allMyChildren"
This grows until the next image restart (ThisOSProcess class>>startUp:)
and could be a big memory leak if you are running a lot of commands
in a long-running Squeak image. If that is a problem, see the methods
in ThisOSProcess category 'child process management' for ideas.
2) Check and see if a file handle leak may be happening:
"(OSProcess outputOf: 'ls /proc/', OSProcess thisOSProcess pid asString, '/fd | wc -l') asInteger"
OSProcess tries to keep file handles cleaned up, but it's worth checking
in case there is a leak.
HTH,
Dave
More information about the Squeak-dev
mailing list
|