[Pharo-fuel] [Vm-dev] Fwd: Possible collections problem

Sun Jun 2 21:03:48 UTC 2013

On Sun, Jun 02, 2013 at 12:16:16PM +0200, Max Leske wrote:
>  
> Dave, you're a genius :)
> 
> When I started playing around with detaching / reattaching XDisplay and serializing without image fork I first found that I could reproduce the problem without a fork. The trick was to serialize in an additional fork after decapitating:
> 
> [ OSProcess thisOSProcess decapitate.
>   NSSnapshotSerializer serialize: CTMSession rootModel.
>   OSProcess thisOSProcess displayOnXServer: ':88' ] forkAt: 30
> 
> From there it seemed a good idea to see if other processes were interfering and of course the ui related processes came to mind:
> 
> [ OSProcess thisOSProcess decapitate.
>   Project uiProcess terminate.
>   InputEventFetcher shutDown.
>   NSSnapshotSerializer serialize: CTMSession rootModel.
>   
>   InputEventFetcher startUp.
>   Project spawnNewProcess.
>   OSProcess thisOSProcess displayOnXServer: ':88' ] forkAt: 30
> 
> And halleluja! It worked!
> Then I tried the same thing but in the forked image (with 10 repeats):
> 
> 10 timesRepeat: [ [ 
> 	OSProcess thisOSProcess forkSqueakHeadlessAndDo: [ 
> 		Project uiProcess terminate.
>   		InputEventFetcher shutDown.
>   		NSSnapshotSerializer serialize: CTMSession rootModel ] ] forkAt: 30 ]
> 
> 
> And voil?, every serialization terminated.
> 
> Thanks again for your help. You put me on the right track.
> 
> 
> I verified that serialization works by simply terminating the event fetching process. This leads me to believe that the event fetching process can become locked (there's a semaphore for waiting on events). What I can't explain though is why the behavior was so erratic (serialization did work quite often). I can only guess that it might depend on the state of the semaphore when the image is being forked. Do you have an idea? BTW, I also found that I could bring the image back to life by forking a second process that suspended the serialization process after a couple of seconds and reattached the display. But this process ran at priority 40 (running fork of the serialization process with a different priority didn't help however).
> 

I don't think there that there is anything about a Smalltalk Semaphore that
would be affected by the #forkSqueak. The state of the semaphore would be
identical in the forked image and in its parent image, and all state variables
in the VM would also be the same.

However, it does seem quite possible that running an image (forked or not)
with its X11 display shut off might lead to problems after a while. It is
not something that I have tested under heavy load, so it is quite possible
that you have uncovered some issue related to this.

If your stuck images are having a problem related to input event handling,
then the event handling process would presumably be the one that is running
and causing the 100% CPU load. You may be able to confirm this with the
SIGUSR1 trick (below) to dump the Smalltalk stacks to the console output.

I'm not too familiar with the InputEventFetcher in Pharo, but if the
inputSemaphore was being signaled by the VM to indicate events available,
and if for some reason the primitive to retrieve the event was failing,
then it looks to me like it might loop in that condition with 100% CPU.
That's a complete wild guess, but something like this may be happening.

> One last question: should this go into OSProcess (I use a older version *cough cough* fo OSProcess and don't know what the changes)?
> 

I don't think there is anything that I would change in OSProcess at this
point, I would first want to figure out the underlying cause of the apparent
event handling problem before changing anything.

If you are using Pharo 2.0 you may want to move to the latest version of
OSProcess, as I have made a number of updates to make it load cleanly
in Pharo. Otherwise, I don't think it matters what version of OSProcess
you are using, as the #forkSqueak behaviour has not changed recently.

> Cheers from very rainy Switzerland from a very happy Max :)
> 

Cheers from Michigan, where the weather has been very nice today for
a change :)

Dave

> 
> 
> On 30.05.2013, at 15:38, vm-dev-request at lists.squeakfoundation.org wrote:
> 
> > A couple more debugging ideas:
> > 
> > Maybe you can get more clues by looking at a stack dump from the
> > stuck process. You can ask the VM to dump stack on receipt of a unix
> > signal like this (in this example I'm using SIGUSR1):
> > 
> >  OSProcess accessor setPrintAllStacksOnSigUsr1.
> >  child := OSProcess thisOSProcess forkHeadlessSqueakAndDoThenQuit: [
> >  	"do the serialization stuff" (Delay forSeconds: 10) wait
> >  	].
> > 
> >  "wait for child to get stuck" (Delay forSeconds: 5) wait.
> > 
> >  "send SIGUSR1 to forked squeak, you can do this from unix console of course"
> >  OSProcess accessor primSendSigusr1To: child pid.
> >  OSProcess accessor clearPrintAllStacksOnSigUsr1.
> >  child inspect.
> > 
> > Perhaps the issue is associated with the headless display (X11 disconnected)
> > and does not have anything to do with the forked image per se. If so, it should
> > be possible to reproduce the problem in a non-forked image by turning off the
> > X display while serialization is being done:
> > 
> >  OSProcess thisOSProcess decapitate.
> >  "Do serialization stuff" (Delay forSeconds: 5) wait.
> >  OSProcess thisOSProcess recapitate.
> > 
> > Try that about 10 times and see if it ever gets stuck.
> > 
> > Dave
>