I guess this belongs more to the box-admins list, so I moved it here. I killed 45 runaway processes, and found that jenkins is down.
On Thu, 16 Oct 2014, Frank Shearar wrote:
The VM processes are apt to leak for reasons unknown: Maybe Jenkins can't track the shell games I have to try play with Squeak. Suggestions for improvement actively sought.
I think it would be worth restarting jenkins, but disabling all but one of these VM-related processes until this is fixed. It would help if you could give us some pointers about what is wrong, and where to start looking around.
Levente
From: David T. Lewis Sent: 15/10/2014 01:52 To: The general-purpose Squeak developers list Subject: Re: [squeak-dev] http://build.squeak.org/ down?
On Tue, Oct 14, 2014 at 08:37:56PM -0400, Chris Cunnington wrote:
On Oct 14, 2014, at 8:14 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Getting a 503 at the moment ...
Yes. I noticed this morning that Chris Muller has started a new SS process on box4. And the usual address [1] is routing to the default when there is no service on box2. I conclude that he is moving SS to box4 and is in-between boxes. I would imagine that the process on box4 needs to be announced by routing the dns on box2 to its new location [2].
No this is nothing to do with anything on box4. The build.squeak.org service is on box3 along with squeaksource.com. Squeaksource is fine, but Jenkins is not. There are currently about 30 Squeak VMs of various flavors running under the Jenkins uid. Ick. I'll try to restart the Jenkins service and see if I can clear out the mess.
Dave
On Thu, Oct 16, 2014 at 11:18:22AM +0200, Levente Uzonyi wrote:
I guess this belongs more to the box-admins list, so I moved it here. I killed 45 runaway processes, and found that jenkins is down.
On Thu, 16 Oct 2014, Frank Shearar wrote:
The VM processes are apt to leak for reasons unknown: Maybe Jenkins can't track the shell games I have to try play with Squeak. Suggestions for improvement actively sought.
I think it would be worth restarting jenkins, but disabling all but one of these VM-related processes until this is fixed. It would help if you could give us some pointers about what is wrong, and where to start looking around.
+1
I restarted Jenkins so the build.squeak.org site will be active, but we really need to clean this up.
Frank, do you agree with Levente's suggestion to disable projects until the issue is resolved?
Dave
On 17 October 2014 03:35, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Oct 16, 2014 at 11:18:22AM +0200, Levente Uzonyi wrote:
I guess this belongs more to the box-admins list, so I moved it here. I killed 45 runaway processes, and found that jenkins is down.
On Thu, 16 Oct 2014, Frank Shearar wrote:
The VM processes are apt to leak for reasons unknown: Maybe Jenkins can't track the shell games I have to try play with Squeak. Suggestions for improvement actively sought.
I think it would be worth restarting jenkins, but disabling all but one of these VM-related processes until this is fixed. It would help if you could give us some pointers about what is wrong, and where to start looking around.
+1
I restarted Jenkins so the build.squeak.org site will be active, but we really need to clean this up.
Frank, do you agree with Levente's suggestion to disable projects until the issue is resolved?
I'm not in a position to do much about things until week after next: personal emergency. Do whatever you think is appropriate to mitigate the problem.
It will probably boil down to an exception causing a debugger, and since Squeak's command line support is... not adequate, it's impossible to debug without someone manually running the same process on a local machine, hacking the build scripts to run headfully. This has been my experience, at least, whenever we have had this problem.
If someone's sufficiently motivated, they should be able to follow the instructions at https://github.com/squeak-smalltalk/squeak-ci/ (and hit me if they're not) and run something like "rake clean build test_package[Control]" to reproduce the issue. You'll need to edit lib/squeak-ci/build.rb replacing the vm_args method with this:
def vm_args(os_name) [] # Force headful end
Only bother if you're comfortable with installing a Ruby. Diagnosing and solving the problem will require more knowledge about process spawning and subshells than I possess. run_image_with_cmd is almost certainly the source of the problem: https://github.com/squeak-smalltalk/squeak-ci/blob/master/lib/squeak-ci/buil...
frank
Dave
On Fri, Oct 17, 2014 at 04:03:43PM +0100, Frank Shearar wrote:
On 17 October 2014 03:35, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Oct 16, 2014 at 11:18:22AM +0200, Levente Uzonyi wrote:
I guess this belongs more to the box-admins list, so I moved it here. I killed 45 runaway processes, and found that jenkins is down.
On Thu, 16 Oct 2014, Frank Shearar wrote:
The VM processes are apt to leak for reasons unknown: Maybe Jenkins can't track the shell games I have to try play with Squeak. Suggestions for improvement actively sought.
I think it would be worth restarting jenkins, but disabling all but one of these VM-related processes until this is fixed. It would help if you could give us some pointers about what is wrong, and where to start looking around.
+1
I restarted Jenkins so the build.squeak.org site will be active, but we really need to clean this up.
Frank, do you agree with Levente's suggestion to disable projects until the issue is resolved?
I'm not in a position to do much about things until week after next: personal emergency. Do whatever you think is appropriate to mitigate the problem.
All the best to you Frank, I will try to keep a closer eye on it while you are away.
For now I have just disabled the ExternalPackages project. I had to kill a bunch of disconnected Squeak VMs and restart Jenkins, but I noticed that all of those VMs appeared to be associated with the ExternalPackages workspace. I'll keep all the others enabled for now, and see if the problem comes back.
Dave
On 18 October 2014 21:41, David T. Lewis lewis@mail.msen.com wrote:
On Fri, Oct 17, 2014 at 04:03:43PM +0100, Frank Shearar wrote:
On 17 October 2014 03:35, David T. Lewis lewis@mail.msen.com wrote:
On Thu, Oct 16, 2014 at 11:18:22AM +0200, Levente Uzonyi wrote:
I guess this belongs more to the box-admins list, so I moved it here. I killed 45 runaway processes, and found that jenkins is down.
On Thu, 16 Oct 2014, Frank Shearar wrote:
The VM processes are apt to leak for reasons unknown: Maybe Jenkins can't track the shell games I have to try play with Squeak. Suggestions for improvement actively sought.
I think it would be worth restarting jenkins, but disabling all but one of these VM-related processes until this is fixed. It would help if you could give us some pointers about what is wrong, and where to start looking around.
+1
I restarted Jenkins so the build.squeak.org site will be active, but we really need to clean this up.
Frank, do you agree with Levente's suggestion to disable projects until the issue is resolved?
I'm not in a position to do much about things until week after next: personal emergency. Do whatever you think is appropriate to mitigate the problem.
All the best to you Frank, I will try to keep a closer eye on it while you are away.
Thanks very much, David.
For now I have just disabled the ExternalPackages project. I had to kill a bunch of disconnected Squeak VMs and restart Jenkins, but I noticed that all of those VMs appeared to be associated with the ExternalPackages workspace. I'll keep all the others enabled for now, and see if the problem comes back.
Oh, that's interesting! Only ExternalPackages, and not any of the ExternalPackage-Foo builds like ExternalPackage-Xtream, ExternalPackage-Control?
ExternalPackages tests way too much, and takes way too long, and I'd planned on eventually replacing it completely with the ExternalPackage-Foo builds. (Not sure yet how to do that across Squeak versions...)
frank
Dave
box-admins@lists.squeakfoundation.org