[Box-Admins] Build.squeak.org jobs stuck

Frank Shearar frank.shearar at gmail.com
Thu Nov 21 10:11:20 UTC 2013


On 21 November 2013 00:47, David T. Lewis <lewis at mail.msen.com> wrote:
> On Wed, Nov 20, 2013 at 05:39:35PM -0600, Ken Causey wrote:
>> I would really appreciate it if someone with Jenkins expertise would
>> look at the situation with build.squeak.org (aka box3).  Multiple jobs
>> are stuck and some date back to the 17th.  I would just kill them but it
>> seems to me the problem will be right back in a matter of hours or no
>> more than a day.  If nothing else maybe the problem job(s) can simply be
>> suspended temporarily.
>>
>> Ken
>
> I don't know the underlying problem, but there were lots of ruby and squeakvm
> processes reparented to root, and no clear indication of what is going wrong.
> I killed off as many of the reparented processes as I could find.

The underlying problem is this:
* rake starts running a build
* it spawns a process to fire up a Squeak image, running tests. Call this A.
* it also spawns a thread that will, after "too long" has passed -
240s by default - theoretically
** send a USR1 to A
** dump the pstree info for A (extra debug info while we try get this
process working reliably)
** send a KILL to A
* and yet the squeakvm process is not killed

You can see in the build logs that the thread does attempt to kill the
process - look for "!!!" - and yet the tests keep on rolling.

What would be useful is finding out to which builds the squeakvm
processes belong. We ought to be able to do that quite easily since
each squeakvm process will include the job name in its path.

I did just discover a typo in the kill-it thread which would result in
the "nil doesn't understand the #puts method" error disappearing and
nothing happening. I've just committed that change and rerun
SqueakTrunk. Hopefully that will help us.

> Currently there is one Jenkins job running, and one squeakvm process corresponding
> to that job. I'll look again in a day or so and see how many runaway processes
> may have come back.

Ditto.

frank

> Dave
>


More information about the Box-Admins mailing list