[squeak-dev] OSProcess endless loop

Marcel Taeumel marcel.taeumel at hpi.de
Mon Apr 18 10:01:59 UTC 2022


Hi Chris --

I suggest that you use the next released OSVM. I will dig into 2 more things tomorrow and then hopefully tag a new pre-release. Fingers crossed.

Best,
Marcel
Am 18.04.2022 03:37:56 schrieb Chris Muller <ma.chris.m at gmail.com>:
Those are some great tips, thank you! I need to pay closer attention
to OSProcess and respect the complexity and resources it has to manage
and not just run it roughshod. Do you think I'm fine with the 5.3
production VM, or is it certain that a newer version is needed to use
#outputOf: (i.e., Marcel mentioned a AIO fix, should I also consider
that a candidate issue?).

- Chris

On Fri, Apr 15, 2022 at 8:44 AM David T. Lewis wrote:
>
> Hi Chris,
>
> On Thu, Apr 14, 2022 at 09:18:21PM -0400, David T. Lewis wrote:
> > Hi Chris,
> >
> > This might be tricky to debug, but message received. See below.
> >
> > On Wed, Apr 13, 2022 at 09:36:57PM -0500, Chris Muller wrote:
> > > Hi Dave,
> > >
> > > I've been having occasional issues with OSProcess somehow getting
> > > hosed up and becoming unusable in my image. Tonight it started again,
> > > and I'm rather stuck in the water at the moment, unsure how to break
> > > out of it other than building a new image.
> > >
> > > Magma now uses OSProcess "outputOf: 'free -wb'" every few seconds to
> > > get ahead of any potential OutOfMemory signals. But somehow my image
> > > got into a state where the simplest uses of OSProcess lock up.
> > > Whenever it happens, I see these messages in the console:
> > >
> > > 364147968:982663168:[] in
> > > AioEventHandler>>initializeForExceptions:readEvents:writeEvents::aio
> > > event forwarding not supported
> > > 364147968:37728768:[] in
> > > AioEventHandler>>initializeForExceptions:readEvents:writeEvents::aio
> > > event forwarding not supported
> > >
> > > From my feeble debugging, it seems #primRead:into:startingAt:count: is
> > > returning a 0 count, which is what leads to OSProcess's
> > > logic-flow to never be able to break out of the while loop in
> > > BufferedAsyncFileReadStream>>#upToEndOfFile. Here's a rough stack
> > > trace of that loop:
> > >
> > > BufferedAsyncFileReadStream>>#upToEndOfFile
> > > BufferedAsyncFileReadStream>>#atEndOfFile
> > > (readBuffer atEnd = true, OSProcess accessor isAtEndOfFile:
> > > fileID returns false)
> > > BufferedAsyncFileReadStream>>#readAvailableDataFrom:into:
> > > primRead:into:startingAt:count: (---> answers 0)
> > > OSProcessAccessor>>#isAtEndOfFile: (---> answers false)
> > > (restart loop in BufferedAsyncFileReadStream>>#upToEndOfFile)
> > >
> > > It would be nice if OSProcess could detect this situation and signal
> > > some kind of error. With the endless loop, it sometimes takes a while
> > > to get to the bottom of why something isn't responsive.
> > >
> > > I'm running production 5.3 with the latest OSProcess and CommandShell.
> > > I thought it might be a resource issue on my laptop, but rebooting
> > > didn't help. Rebuilding from fresh 5.3 image always works, however,
> > > the weirdest thing is, the problem seems to clear ITSELF up. Like,
> > > OMG, right now, it's working again! I had just run a test in a fresh
> > > image to test multiple processes hitting OSProcess outputOf:. It
> > > worked fine and when I came back to my problem image, it's suddenly
> > > working again!
> > >
> > > Can you think of anything I might be doing to get into this situation
> > > and/or how to break out of it? Something to avoid or initialize?
> > >
> > > The other thing I noticed, when I would break into the locked up
> > > OSProcess with Cmd+. (dot), there were TWO processes stuck
> > > in the loop, one from my DoIt, the other originating from the line:
> > >
> > > "self changed: #childProcessStatus"
> > >
> > > of #grimReaperProcess. Sigh.. I apparently already closed those
> > > debuggers and now can't reproduce the issue to paste their bug report
> > > stack traces! Sorry.
> > >
> > > I hope I'm not the only one experiencing this issue so we can
> > > hopefully track this down. It's insidious when it happens.
> > >
> > > Thanks,
> > > Chris
> > >
> >
> > A bit off the top of my head and totally untested, but the first thing
> > that comes to mind is that the check for #isAtEndOfFile is calling
> >
> > which is probably going to see an end of file on the pipe only if the
> > external child OSProcess has actually terminated normally. Terminating
> > normally means not just that the child process (running your 'free -wb'
> > command in a bash shell) has exited, but also that the parent process
> > (the Squeak VM) has noticed it and has harvested the child exit status.
> >
> > There is a Smalltalk process ('the child OSProcess watcher' in your process
> > browser) that is responsible for harvesting the child process to get its
> > exit status. This is event driven and can sometimes be a bit fragile. So
> > the first thing that comes to mind if you have an image stuck in the
> > condition that you describe would be to give that process a kick and the
> > pants and see if things get better.
> >
> > If you can catch one of your images in a wedged condition, here are some
> > do-its that might possibly break it free. If either of these work, it
> > will give us an idea of where the underlying problem might lie.
> >
> > OSProcess thisOSProcess updateActiveChildren.
> >
> > OSProcess accessor restartChildWatcherProcess.
> >
> > My guess is that the #updateActiveChildren method is most likely to
> > un-wedge things but this is definitely just a guess.
> >
> > Dave
> >
> >
>
> A couple of other things you might check:
>
> 1) There is a registry of current and past child processes:
>
> "OSProcess thisOSProcess allMyChildren"
>
> This grows until the next image restart (ThisOSProcess class>>startUp:)
> and could be a big memory leak if you are running a lot of commands
> in a long-running Squeak image. If that is a problem, see the methods
> in ThisOSProcess category 'child process management' for ideas.
>
> 2) Check and see if a file handle leak may be happening:
>
> "(OSProcess outputOf: 'ls /proc/', OSProcess thisOSProcess pid asString, '/fd | wc -l') asInteger"
>
> OSProcess tries to keep file handles cleaned up, but it's worth checking
> in case there is a leak.
>
> HTH,
> Dave
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20220418/5797c205/attachment.html>


More information about the Squeak-dev mailing list