On 14 January 2013 13:24, David T. Lewis lewis@mail.msen.com wrote:
Good idea to add a watchdog timer. Another good practice is to use the 'nice' command (/usr/bin/nice) in the command lines that run Squeak. This runs the tests at lower scheduling priority, so if a process gets stuck consuming close to 100% cpu, it impact on other system users will be reduced (it will still gobble up all the cpu, it just won't drag the system down so badly).
I don't know what the problem was in this particular case, but one thing that can result in Squeak consuming 100% is an error in the image that causes too much memory usage, such as a recursion error. Squeak keeps asking for more memory, the VM asks the OS for more, and eventually you are swapping. If this turns out to have been the problem, you can prevent the runaway memory condition with the '-memory' command line option to the VM (but don't do that unless we can confirm that it really *is* the problem, I'm just mentioning it for future reference).
It's a repeatable problem, at least: http://squeakci.org/job/SqueakTrunkOnBleedingEdgeCog/17/console. I haven't had a chance to add debug info though.
frank
Dave
On Mon, Jan 14, 2013 at 08:06:27AM +0000, Frank Shearar wrote:
Ah, no, that's not a debugger then.
I'm going to slap a 15 minute kill time on the jobs later today: our longest running jobs so far are around 9 minutes.
frank
On 13 January 2013 20:26, Ken Causey ken@kencausey.com wrote:
Great, also I think I should point out that I don't think it was just that an exception had not been caught. The process was pegging the CPU (running full out, 99%+ CPU usage).
Ken
On 01/13/2013 02:18 PM, Frank Shearar wrote:
I just killed the job. I'll need to add more output to the script, like the precise Cog version involved. I expect that particular job to be less stable than SqueakTrunk - it _is_ bleeding edge on both image _and_ VM side, after all.
frank
On 13 January 2013 19:37, Ken Causeyken@kencausey.com wrote:
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken