[squeak-dev] OSProcess endless loop

David T. Lewis lewis at mail.msen.com
Fri Apr 15 01:18:21 UTC 2022


Hi Chris,

<ACK> This might be tricky to debug, but message received. See below.

On Wed, Apr 13, 2022 at 09:36:57PM -0500, Chris Muller wrote:
> Hi Dave,
> 
> I've been having occasional issues with OSProcess somehow getting
> hosed up and becoming unusable in my image.  Tonight it started again,
> and I'm rather stuck in the water at the moment, unsure how to break
> out of it other than building a new image.
> 
> Magma now uses OSProcess "outputOf: 'free -wb'" every few seconds to
> get ahead of any potential OutOfMemory signals.  But somehow my image
> got into a state where the simplest uses of OSProcess lock up.
> Whenever it happens, I see these messages in the console:
> 
> 364147968:982663168:[] in
> AioEventHandler>>initializeForExceptions:readEvents:writeEvents::aio
> event forwarding not supported
> 364147968:37728768:[] in
> AioEventHandler>>initializeForExceptions:readEvents:writeEvents::aio
> event forwarding not supported
> 
> From my feeble debugging, it seems #primRead:into:startingAt:count: is
> returning a 0 count, which is what leads to OSProcess's
> logic-flow to never be able to break out of the while loop in
> BufferedAsyncFileReadStream>>#upToEndOfFile.  Here's a rough stack
> trace of that loop:
> 
>     BufferedAsyncFileReadStream>>#upToEndOfFile
>     BufferedAsyncFileReadStream>>#atEndOfFile
>         (readBuffer atEnd = true, OSProcess accessor isAtEndOfFile:
> fileID returns false)
>     BufferedAsyncFileReadStream>>#readAvailableDataFrom:into:
>     primRead:into:startingAt:count: (---> answers 0)
>     OSProcessAccessor>>#isAtEndOfFile: (---> answers false)
>     (restart loop in BufferedAsyncFileReadStream>>#upToEndOfFile)
> 
> It would be nice if OSProcess could detect this situation and signal
> some kind of error.  With the endless loop, it sometimes takes a while
> to get to the bottom of why something isn't responsive.
> 
> I'm running production 5.3 with the latest OSProcess and CommandShell.
> I thought it might be a resource issue on my laptop, but rebooting
> didn't help.  Rebuilding from fresh 5.3 image always works, however,
> the weirdest thing is, the problem seems to clear ITSELF up.  Like,
> OMG, right now, it's working again!  I had just run a test in a fresh
> image to test multiple processes hitting OSProcess outputOf:.  It
> worked fine and when I came back to my problem image, it's suddenly
> working again!
> 
> Can you think of anything I might be doing to get into this situation
> and/or how to break out of it?  Something to avoid or initialize?
> 
> The other thing I noticed, when I would break into the locked up
> OSProcess with Cmd+. (dot), there were TWO processes stuck
> in the loop, one from my DoIt, the other originating from the line:
> 
>    "self changed: #childProcessStatus"
> 
> of #grimReaperProcess.  Sigh..  I apparently already closed those
> debuggers and now can't reproduce the issue to paste their bug report
> stack traces!  Sorry.
> 
> I hope I'm not the only one experiencing this issue so we can
> hopefully track this down.  It's insidious when it happens.
> 
> Thanks,
>   Chris
> 

A bit off the top of my head and totally untested, but the first thing
that comes to mind is that the check for  #isAtEndOfFile is calling
<primitive: 'primitiveTestEndOfFileFlag' module: 'UnixOSProcessPlugin'>
which is probably going to see an end of file on the pipe only if the
external child OSProcess has actually terminated normally. Terminating
normally means not just that the child process (running your 'free -wb'
command in a bash shell) has exited, but also that the parent process
(the Squeak VM) has noticed it and has harvested the child exit status.

There is a Smalltalk process ('the child OSProcess watcher' in your process
browser) that is responsible for harvesting the child process to get its
exit status. This is event driven and can sometimes be a bit fragile. So
the first thing that comes to mind if you have an image stuck in the
condition that you describe would be to give that process a kick and the
pants and see if things get better.

If you can catch one of your images in a wedged condition, here are some
do-its that might possibly break it free. If either of these work, it
will give us an idea of where the underlying problem  might lie.

	OSProcess thisOSProcess updateActiveChildren.

	OSProcess accessor restartChildWatcherProcess.

My guess is that the #updateActiveChildren method is most likely to
un-wedge things but this is definitely just a guess.

Dave



More information about the Squeak-dev mailing list