Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
I just killed the job. I'll need to add more output to the script, like the precise Cog version involved. I expect that particular job to be less stable than SqueakTrunk - it _is_ bleeding edge on both image _and_ VM side, after all.
frank
On 13 January 2013 19:37, Ken Causey ken@kencausey.com wrote:
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
Great, also I think I should point out that I don't think it was just that an exception had not been caught. The process was pegging the CPU (running full out, 99%+ CPU usage).
Ken
On 01/13/2013 02:18 PM, Frank Shearar wrote:
I just killed the job. I'll need to add more output to the script, like the precise Cog version involved. I expect that particular job to be less stable than SqueakTrunk - it _is_ bleeding edge on both image _and_ VM side, after all.
frank
On 13 January 2013 19:37, Ken Causeyken@kencausey.com wrote:
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
Ah, no, that's not a debugger then.
I'm going to slap a 15 minute kill time on the jobs later today: our longest running jobs so far are around 9 minutes.
frank
On 13 January 2013 20:26, Ken Causey ken@kencausey.com wrote:
Great, also I think I should point out that I don't think it was just that an exception had not been caught. The process was pegging the CPU (running full out, 99%+ CPU usage).
Ken
On 01/13/2013 02:18 PM, Frank Shearar wrote:
I just killed the job. I'll need to add more output to the script, like the precise Cog version involved. I expect that particular job to be less stable than SqueakTrunk - it _is_ bleeding edge on both image _and_ VM side, after all.
frank
On 13 January 2013 19:37, Ken Causeyken@kencausey.com wrote:
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
Good idea to add a watchdog timer. Another good practice is to use the 'nice' command (/usr/bin/nice) in the command lines that run Squeak. This runs the tests at lower scheduling priority, so if a process gets stuck consuming close to 100% cpu, it impact on other system users will be reduced (it will still gobble up all the cpu, it just won't drag the system down so badly).
I don't know what the problem was in this particular case, but one thing that can result in Squeak consuming 100% is an error in the image that causes too much memory usage, such as a recursion error. Squeak keeps asking for more memory, the VM asks the OS for more, and eventually you are swapping. If this turns out to have been the problem, you can prevent the runaway memory condition with the '-memory' command line option to the VM (but don't do that unless we can confirm that it really *is* the problem, I'm just mentioning it for future reference).
Dave
On Mon, Jan 14, 2013 at 08:06:27AM +0000, Frank Shearar wrote:
Ah, no, that's not a debugger then.
I'm going to slap a 15 minute kill time on the jobs later today: our longest running jobs so far are around 9 minutes.
frank
On 13 January 2013 20:26, Ken Causey ken@kencausey.com wrote:
Great, also I think I should point out that I don't think it was just that an exception had not been caught. The process was pegging the CPU (running full out, 99%+ CPU usage).
Ken
On 01/13/2013 02:18 PM, Frank Shearar wrote:
I just killed the job. I'll need to add more output to the script, like the precise Cog version involved. I expect that particular job to be less stable than SqueakTrunk - it _is_ bleeding edge on both image _and_ VM side, after all.
frank
On 13 January 2013 19:37, Ken Causeyken@kencausey.com wrote:
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
On 14 January 2013 13:24, David T. Lewis lewis@mail.msen.com wrote:
Good idea to add a watchdog timer. Another good practice is to use the 'nice' command (/usr/bin/nice) in the command lines that run Squeak. This runs the tests at lower scheduling priority, so if a process gets stuck consuming close to 100% cpu, it impact on other system users will be reduced (it will still gobble up all the cpu, it just won't drag the system down so badly).
I don't know what the problem was in this particular case, but one thing that can result in Squeak consuming 100% is an error in the image that causes too much memory usage, such as a recursion error. Squeak keeps asking for more memory, the VM asks the OS for more, and eventually you are swapping. If this turns out to have been the problem, you can prevent the runaway memory condition with the '-memory' command line option to the VM (but don't do that unless we can confirm that it really *is* the problem, I'm just mentioning it for future reference).
It's a repeatable problem, at least: http://squeakci.org/job/SqueakTrunkOnBleedingEdgeCog/17/console. I haven't had a chance to add debug info though.
frank
Dave
On Mon, Jan 14, 2013 at 08:06:27AM +0000, Frank Shearar wrote:
Ah, no, that's not a debugger then.
I'm going to slap a 15 minute kill time on the jobs later today: our longest running jobs so far are around 9 minutes.
frank
On 13 January 2013 20:26, Ken Causey ken@kencausey.com wrote:
Great, also I think I should point out that I don't think it was just that an exception had not been caught. The process was pegging the CPU (running full out, 99%+ CPU usage).
Ken
On 01/13/2013 02:18 PM, Frank Shearar wrote:
I just killed the job. I'll need to add more output to the script, like the precise Cog version involved. I expect that particular job to be less stable than SqueakTrunk - it _is_ bleeding edge on both image _and_ VM side, after all.
frank
On 13 January 2013 19:37, Ken Causeyken@kencausey.com wrote:
Sorry, that process line was unintentionally chopped off
jenkins 29126 99.6 2.3 1054380 24552 ? R 03:20 1032:16 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
Ken
On 01/13/2013 01:10 PM, Ken Causey wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null
/var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
On 01/14/2013 07:24 AM, David T. Lewis wrote:
Good idea to add a watchdog timer. Another good practice is to use the 'nice' command (/usr/bin/nice) in the command lines that run Squeak. This runs the tests at lower scheduling priority, so if a process gets stuck consuming close to 100% cpu, it impact on other system users will be reduced (it will still gobble up all the cpu, it just won't drag the system down so badly).
I don't know what the problem was in this particular case, but one thing that can result in Squeak consuming 100% is an error in the image that causes too much memory usage, such as a recursion error. Squeak keeps asking for more memory, the VM asks the OS for more, and eventually you are swapping. If this turns out to have been the problem, you can prevent the runaway memory condition with the '-memory' command line option to the VM (but don't do that unless we can confirm that it really *is* the problem, I'm just mentioning it for future reference).
Dave
I could be misunderstanding but I don't think this is it. From the ps output: '1054380 24552'. These are respectively the VSZ (kbytes, virtual memory size) and RSS (kilobytes, resident set size). The relevant number being the second and which is really quite low for a Squeak process, while still being within the expected range of course.
Ken
On 14 January 2013 16:32, Ken Causey ken@kencausey.com wrote:
On 01/14/2013 07:24 AM, David T. Lewis wrote:
Good idea to add a watchdog timer. Another good practice is to use the 'nice' command (/usr/bin/nice) in the command lines that run Squeak. This runs the tests at lower scheduling priority, so if a process gets stuck consuming close to 100% cpu, it impact on other system users will be reduced (it will still gobble up all the cpu, it just won't drag the system down so badly).
I don't know what the problem was in this particular case, but one thing that can result in Squeak consuming 100% is an error in the image that causes too much memory usage, such as a recursion error. Squeak keeps asking for more memory, the VM asks the OS for more, and eventually you are swapping. If this turns out to have been the problem, you can prevent the runaway memory condition with the '-memory' command line option to the VM (but don't do that unless we can confirm that it really *is* the problem, I'm just mentioning it for future reference).
Dave
I could be misunderstanding but I don't think this is it. From the ps output: '1054380 24552'. These are respectively the VSZ (kbytes, virtual memory size) and RSS (kilobytes, resident set size). The relevant number being the second and which is really quite low for a Squeak process, while still being within the expected range of course.
Ken
I looked at what logging _is_ in place in that script, and we're not even hitting the tests themselves. We're hanging during this: FileDirectory default fullNameFor: 'HudsonBuildTools.st'
in tests.st.
frank
It also helps a whole lot if one writes things correctly. I broke the tests.st script by logging incorrectly. I'm going to have to fix this log errors to console thing.
frank
On 14 January 2013 17:11, Frank Shearar frank.shearar@gmail.com wrote:
On 14 January 2013 16:32, Ken Causey ken@kencausey.com wrote:
On 01/14/2013 07:24 AM, David T. Lewis wrote:
Good idea to add a watchdog timer. Another good practice is to use the 'nice' command (/usr/bin/nice) in the command lines that run Squeak. This runs the tests at lower scheduling priority, so if a process gets stuck consuming close to 100% cpu, it impact on other system users will be reduced (it will still gobble up all the cpu, it just won't drag the system down so badly).
I don't know what the problem was in this particular case, but one thing that can result in Squeak consuming 100% is an error in the image that causes too much memory usage, such as a recursion error. Squeak keeps asking for more memory, the VM asks the OS for more, and eventually you are swapping. If this turns out to have been the problem, you can prevent the runaway memory condition with the '-memory' command line option to the VM (but don't do that unless we can confirm that it really *is* the problem, I'm just mentioning it for future reference).
Dave
I could be misunderstanding but I don't think this is it. From the ps output: '1054380 24552'. These are respectively the VSZ (kbytes, virtual memory size) and RSS (kilobytes, resident set size). The relevant number being the second and which is really quite low for a Squeak process, while still being within the expected range of course.
Ken
I looked at what logging _is_ in place in that script, and we're not even hitting the tests themselves. We're hanging during this: FileDirectory default fullNameFor: 'HudsonBuildTools.st'
in tests.st.
frank
On 13 January 2013 19:10, Ken Causey ken@kencausey.com wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
It's probably a process that's brought up a debugger. I'd really like to adjust the UIManager to do things like dump stack traces to console, and kill processes in the event of failed tests. I'm happy for anyone to kill these kinds of jobs. I don't always notice them.
frank
On 01/13/2013 01:44 PM, Frank Shearar wrote:
On 13 January 2013 19:10, Ken Causeyken@kencausey.com wrote:
Roughly every day or two I login to box3 and check things out and check for package updates. With rare exception the system is quiet, I check for updates, apply any found, and move on. But today I find this (from ps auwx)
jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40 /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak -vm-sound-null -vm-display-null /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image /var/lib/jenkins/workspace/Sq
As you can see this has used 1000+ minutes of CPU time (which is less than the actual running time). I've not seen this before on the server. Is it perhaps the result of a new build project and expected? Or an actual problem? Out of caution and since the system is already busy I haven't checked for package updates yet today (I think the last time I did so was Friday).
Ken
It's probably a process that's brought up a debugger. I'd really like to adjust the UIManager to do things like dump stack traces to console, and kill processes in the event of failed tests. I'm happy for anyone to kill these kinds of jobs. I don't always notice them.
frank
OK, can you give some guidance as to how long is too long for the build processes? Does it vary much based on the job?
Ken
box-admins@lists.squeakfoundation.org