Hi folks:
I've been working with David Lewis, trying to figure out why OSProcess has been leaving behind <defunct> zombie processes. It turns out, the problem isn't with OSProcess at all...
In my image, the SystemDictionary's StartUpList has UnixOSProcessAccessor later in the ordered collection, after SecurityManager. When the image gets restarted, only classes iterated over the StartUpList up to and including SecurityManager receive the startUp message. After that, SystemDictionary>> send:toClassesNamedIn:with: terminates prematurely, leaving several classes not restarted (including UnisOSProcessAccessor, the source of the OSProcess zombies). Actually I'm not sure it terminates at all, it seems to get stuck at SecurityManager.
It looks as though SecurityManager class>>startUp never returns...but only during startup. Once the image is up and running, doing a SecurityManager startUp seems to work just fine...it's only during the startup that something funny happens. The trick is to put Transcript show: scaffolding around 'self default startUp' in SecurityManager class>>startUp...open a Transcript, save the image and quit, reload and watch what gets put in the Transcript. The first Transcript show: will appear, but the second does not. However, once the image is up, doing a SecurityManager startUp will show both Transcript show: 's. Perhaps some sort of deadlock is happening during the restart of the image?
I'm a bit stumped at this point...any ideas? This is on the UNIX VM, with the latest changes in the 3.1 image.
To follow up a bit on this: I've tracked things down a bit further. The method that is hanging is SecurityManager>>loadSecurityKeys.
It is hanging at this point: . . file _ [fd readOnlyFileNamed: keysFileName] on: FileDoesNotExistException do:[:ex| nil].
file ifNil:[ ^self]. "no keys file"
. .
For some reason, during the start up of the image, the return from this method seems to deadlock..the rest of SecurityManager>>startUp never gets run.
Again, if you run the startUp manually after the image has started, it works just fine...
Hi folks:
I've been working with David Lewis, trying to figure out why OSProcess has been leaving behind <defunct> zombie processes. It turns out, the problem isn't with OSProcess at all...
In my image, the SystemDictionary's StartUpList has UnixOSProcessAccessor later in the ordered collection, after SecurityManager. When the image gets restarted, only classes iterated over the StartUpList up to and including SecurityManager receive the startUp message. After that, SystemDictionary>> send:toClassesNamedIn:with: terminates prematurely, leaving several classes not restarted (including UnisOSProcessAccessor, the source of the OSProcess zombies). Actually I'm not sure it terminates at all, it seems to get stuck at SecurityManager.
It looks as though SecurityManager class>>startUp never returns...but only during startup. Once the image is up and running, doing a SecurityManager startUp seems to work just fine...it's only during the startup that something funny happens. The trick is to put Transcript show: scaffolding around 'self default startUp' in SecurityManager class>>startUp...open a Transcript, save the image and quit, reload and watch what gets put in the Transcript. The first Transcript show: will appear, but the second does not. However, once the image is up, doing a SecurityManager startUp will show both Transcript show: 's. Perhaps some sort of deadlock is happening during the restart of the image?
I'm a bit stumped at this point...any ideas? This is on the UNIX VM, with the latest changes in the 3.1 image.
Hi Kevin,
Is there a signal handler being installed for SIGCHLD, in the vm? The default action (SIG_DFL) is to ignore the signal. I thought the child PID goes zombie as a result, but I couldn't find evidence in the man pages..
Rob
On Wed, 27 Jun 2001, Kevin Fisher wrote: > Hi folks:
I've been working with David Lewis, trying to figure out why OSProcess has been leaving behind <defunct> zombie processes. It turns out, the problem isn't with OSProcess at all...
In my image, the SystemDictionary's StartUpList has UnixOSProcessAccessor later in the ordered collection, after SecurityManager. When the image gets restarted, only classes iterated over the StartUpList up to and including SecurityManager receive the startUp message. After that, SystemDictionary>> send:toClassesNamedIn:with: terminates prematurely, leaving several classes not restarted (including UnisOSProcessAccessor, the source of the OSProcess zombies). Actually I'm not sure it terminates at all, it seems to get stuck at SecurityManager.
It looks as though SecurityManager class>>startUp never returns...but only during startup. Once the image is up and running, doing a SecurityManager startUp seems to work just fine...it's only during the startup that something funny happens. The trick is to put Transcript show: scaffolding around 'self default startUp' in SecurityManager class>>startUp...open a Transcript, save the image and quit, reload and watch what gets put in the Transcript. The first Transcript show: will appear, but the second does not. However, once the image is up, doing a SecurityManager startUp will show both Transcript show: 's. Perhaps some sort of deadlock is happening during the restart of the image?
I'm a bit stumped at this point...any ideas? This is on the UNIX VM, with the latest changes in the 3.1 image.
Just ignore my previous post, please. Imagine a computer that wasn't completely booting and here I am asking if it's been plugged in. "Is it that thingy causing the problem?" :-)
On Wed, 27 Jun 2001, Rob Withers wrote:
Hi Kevin,
Is there a signal handler being installed for SIGCHLD, in the vm? The default action (SIG_DFL) is to ignore the signal. I thought the child PID goes zombie as a result, but I couldn't find evidence in the man pages..
Rob
On Wed, 27 Jun 2001, Kevin Fisher wrote: > Hi folks:
I've been working with David Lewis, trying to figure out why OSProcess has been leaving behind <defunct> zombie processes. It turns out, the problem isn't with OSProcess at all...
In my image, the SystemDictionary's StartUpList has UnixOSProcessAccessor later in the ordered collection, after SecurityManager. When the image gets restarted, only classes iterated over the StartUpList up to and including SecurityManager receive the startUp message. After that, SystemDictionary>> send:toClassesNamedIn:with: terminates prematurely, leaving several classes not restarted (including UnisOSProcessAccessor, the source of the OSProcess zombies). Actually I'm not sure it terminates at all, it seems to get stuck at SecurityManager.
It looks as though SecurityManager class>>startUp never returns...but only during startup. Once the image is up and running, doing a SecurityManager startUp seems to work just fine...it's only during the startup that something funny happens. The trick is to put Transcript show: scaffolding around 'self default startUp' in SecurityManager class>>startUp...open a Transcript, save the image and quit, reload and watch what gets put in the Transcript. The first Transcript show: will appear, but the second does not. However, once the image is up, doing a SecurityManager startUp will show both Transcript show: 's. Perhaps some sort of deadlock is happening during the restart of the image?
I'm a bit stumped at this point...any ideas? This is on the UNIX VM, with the latest changes in the 3.1 image.
squeak-dev@lists.squeakfoundation.org