[Box-Admins] Long running build process on build.squeak.org?

Mon Jan 14 13:33:19 UTC 2013

On 14 January 2013 13:24, David T. Lewis <lewis at mail.msen.com> wrote:
> Good idea to add a watchdog timer. Another good practice is to use
> the 'nice' command (/usr/bin/nice) in the command lines that run Squeak.
> This runs the tests at lower scheduling priority, so if a process gets
> stuck consuming close to 100% cpu, it impact on other system users will
> be reduced (it will still gobble up all the cpu, it just won't drag the
> system down so badly).
>
> I don't know what the problem was in this particular case, but one
> thing that can result in Squeak consuming 100% is an error in the image
> that causes too much memory usage, such as a recursion error. Squeak
> keeps asking for more memory, the VM asks the OS for more, and eventually
> you are swapping. If this turns out to have been the problem, you can
> prevent the runaway memory condition with the '-memory' command line
> option to the VM (but don't do that unless we can confirm that it really
> *is* the problem, I'm just mentioning it for future reference).

It's a repeatable problem, at least:
http://squeakci.org/job/SqueakTrunkOnBleedingEdgeCog/17/console. I
haven't had a chance to add debug info though.

frank

> Dave
>
> On Mon, Jan 14, 2013 at 08:06:27AM +0000, Frank Shearar wrote:
>> Ah, no, that's not a debugger then.
>>
>> I'm going to slap a 15 minute kill time on the jobs later today: our
>> longest running jobs so far are around 9 minutes.
>>
>> frank
>>
>> On 13 January 2013 20:26, Ken Causey <ken at kencausey.com> wrote:
>> > Great, also I think I should point out that I don't think it was just that
>> > an exception had not been caught.  The process was pegging the CPU (running
>> > full out, 99%+ CPU usage).
>> >
>> > Ken
>> >
>> >
>> > On 01/13/2013 02:18 PM, Frank Shearar wrote:
>> >>
>> >> I just killed the job. I'll need to add more output to the script,
>> >> like the precise Cog version involved. I expect that particular job to
>> >> be less stable than SqueakTrunk - it _is_ bleeding edge on both image
>> >> _and_ VM side, after all.
>> >>
>> >> frank
>> >>
>> >> On 13 January 2013 19:37, Ken Causey<ken at kencausey.com>  wrote:
>> >>>
>> >>> Sorry, that process line was unintentionally chopped off
>> >>>
>> >>> jenkins  29126 99.6  2.3 1054380 24552 ?       R    03:20 1032:16
>> >>> /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak
>> >>> -vm-sound-null -vm-display-null
>> >>>
>> >>> /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image
>> >>> /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/tests.st
>> >>>
>> >>> Ken
>> >>>
>> >>>
>> >>> On 01/13/2013 01:10 PM, Ken Causey wrote:
>> >>>>
>> >>>>
>> >>>> Roughly every day or two I login to box3 and check things out and check
>> >>>> for package updates. With rare exception the system is quiet, I check
>> >>>> for updates, apply any found, and move on. But today I find this (from
>> >>>> ps auwx)
>> >>>>
>> >>>> jenkins 29126 99.7 2.3 1054380 24552 ? R 03:20 1000:40
>> >>>> /var/lib/jenkins/workspace/CogVM/tmp/lib/squeak/4.0-2636/squeak
>> >>>> -vm-sound-null -vm-display-null
>> >>>>
>> >>>>
>> >>>> /var/lib/jenkins/workspace/SqueakTrunkOnBleedingEdgeCog/target/TrunkImage.image
>> >>>> /var/lib/jenkins/workspace/Sq
>> >>>>
>> >>>> As you can see this has used 1000+ minutes of CPU time (which is less
>> >>>> than the actual running time). I've not seen this before on the server.
>> >>>> Is it perhaps the result of a new build project and expected? Or an
>> >>>> actual problem? Out of caution and since the system is already busy I
>> >>>> haven't checked for package updates yet today (I think the last time I
>> >>>> did so was Friday).
>> >>>>
>> >>>> Ken
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >