[Box-Admins] Runaway ruby process on box3 (was: [squeak-dev]
trunk down again)
Ken Causey
ken at kencausey.com
Sat Nov 9 00:39:10 UTC 2013
And as soon as I wrote this I realized having strace64 was pointless and
uninstalled it along with a dependency that was installed with it. I
guess I thought I was on box4 for a moment.
Ken
> -------- Original Message --------
> Subject: RE: [Box-Admins] Runaway ruby process on box3 (was:
> [squeak-dev] trunk down again)
> From: "Ken Causey" <ken at kencausey.com>
> Date: Fri, November 08, 2013 6:35 pm
> To: "Squeak Hosting Support" <box-admins at lists.squeakfoundation.org>
>
>
> strace and strace64 are now installed on box3. Of course anyone with
> sudo access could have done the same.
>
> Ken
>
> > -------- Original Message --------
> > Subject: Re: [Box-Admins] Runaway ruby process on box3 (was:
> > [squeak-dev] trunk down again)
> > From: "David T. Lewis" <lewis at mail.msen.com>
> > Date: Fri, November 08, 2013 6:23 pm
> > To: Squeak Hosting Support <box-admins at lists.squeakfoundation.org>
> >
> >
> > On Fri, Nov 08, 2013 at 11:16:34PM +0000, Frank Shearar wrote:
> > > On 8 November 2013 23:06, David T. Lewis <lewis at mail.msen.com> wrote:
> > > > On Wed, Nov 06, 2013 at 10:02:38PM -0600, Chris Muller wrote:
> > > >> Trunk stopped responding again. box2 might need to be rebooted again.
> > > >
> > > > We have a ruby process running on box3 that is consuming all available CPU,
> > > > that has been reparented to init, and that has been running for a long time.
> > > >
> > > > jenkins at box3-squeak:~$ ps -aef | grep ruby
> > > > jenkins 19054 18955 0 22:48 pts/1 00:00:00 grep ruby
> > > > jenkins 28923 1 87 Nov06 ? 1-19:18:12 /var/lib/jenkins/.rvm/rubies/ruby-1.9.3-p392/bin/ruby -S rspec test/image_test.rb
> > > > jenkins at box3-squeak:~$ ps -l -p 28923
> > > > F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
> > > > 0 R 103 28923 1 87 80 0 - 5205 - ? 1-19:18:19 ruby
> > > > jenkins at box3-squeak:~$ top -p 28923 -b -n 1
> > > > top - 22:48:47 up 199 days, 7:38, 2 users, load average: 7.11, 7.11, 7.16
> > > > Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> > > > Cpu(s): 2.0%us, 0.2%sy, 2.0%ni, 95.3%id, 0.4%wa, 0.0%hi, 0.0%si, 0.1%st
> > > > Mem: 1032140k total, 1009656k used, 22484k free, 74156k buffers
> > > > Swap: 524280k total, 10732k used, 513548k free, 209540k cached
> > > >
> > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> > > > 28923 jenkins 20 0 20820 12m 828 R 87.0 1.3 2598:23 ruby
> > > >
> > > > This appears to be a process that got disconnected from one of our Jenkins
> > > > jobs and has been stuck burning cpu for that last couple of days. That also
> > > > happens to be roughly the time frame in which our source.squeak.org service
> > > > got hung up. The Jenkins jobs (e.g. SqueakTrunk) are interacting with
> > > > source.squeak.org, so it is possible that the two problems are related.
> > > >
> > > > I noticed this because the InterpreterVM and CogVM jobs are failing after
> > > > their watchdog timers expire, but the actual jobs succeed if I run them on
> > > > my own local PC. Those jobs run Squeak at low priority (nice) and it is
> > > > possible that their failures are due to the runaway ruby job consuming all
> > > > available resource.
> > > >
> > > > I have not killed the runaway process yet, in case anyone wants to have a
> > > > look at first.
> > >
> > > I would be happy if you attached strace to it, collected some data,
> > > and then killed it. It's a runaway SqueakTrunk job. (Well, it's clear
> > > you know that, but I just had to point it out.) Hopefully the strace
> > > would give enough clues to find what looks like a tight loop...
> > >
> >
> > We don't have strace installed on box3, sorry.
> >
> > Dave
More information about the Box-Admins
mailing list