[Box-Admins] Runaway ruby process on box3 (was: [squeak-dev] trunk
down again)
David T. Lewis
lewis at mail.msen.com
Fri Nov 8 23:06:42 UTC 2013
On Wed, Nov 06, 2013 at 10:02:38PM -0600, Chris Muller wrote:
> Trunk stopped responding again. box2 might need to be rebooted again.
We have a ruby process running on box3 that is consuming all available CPU,
that has been reparented to init, and that has been running for a long time.
jenkins at box3-squeak:~$ ps -aef | grep ruby
jenkins 19054 18955 0 22:48 pts/1 00:00:00 grep ruby
jenkins 28923 1 87 Nov06 ? 1-19:18:12 /var/lib/jenkins/.rvm/rubies/ruby-1.9.3-p392/bin/ruby -S rspec test/image_test.rb
jenkins at box3-squeak:~$ ps -l -p 28923
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
0 R 103 28923 1 87 80 0 - 5205 - ? 1-19:18:19 ruby
jenkins at box3-squeak:~$ top -p 28923 -b -n 1
top - 22:48:47 up 199 days, 7:38, 2 users, load average: 7.11, 7.11, 7.16
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.0%us, 0.2%sy, 2.0%ni, 95.3%id, 0.4%wa, 0.0%hi, 0.0%si, 0.1%st
Mem: 1032140k total, 1009656k used, 22484k free, 74156k buffers
Swap: 524280k total, 10732k used, 513548k free, 209540k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28923 jenkins 20 0 20820 12m 828 R 87.0 1.3 2598:23 ruby
This appears to be a process that got disconnected from one of our Jenkins
jobs and has been stuck burning cpu for that last couple of days. That also
happens to be roughly the time frame in which our source.squeak.org service
got hung up. The Jenkins jobs (e.g. SqueakTrunk) are interacting with
source.squeak.org, so it is possible that the two problems are related.
I noticed this because the InterpreterVM and CogVM jobs are failing after
their watchdog timers expire, but the actual jobs succeed if I run them on
my own local PC. Those jobs run Squeak at low priority (nice) and it is
possible that their failures are due to the runaway ruby job consuming all
available resource.
I have not killed the runaway process yet, in case anyone wants to have a
look at first.
Dave
More information about the Box-Admins
mailing list