After I received this notice I checked and the website process had the CPU pegged with normal memory usage. I tried to connect with VNC and got connected but the image was locked up. There was a debugger open and I took a screenshot which can be found at
http://users.squeak.org/~kencausey/website_locked.png
I chatted in the IRC channel as I was fiddling with it:
2011-02-28 16:24:21 kencausey JankoMivsek: website process is flipping out again 2011-02-28 16:26:14 kencausey the memory usage is normal this time, it just has the CPU pegged 2011-02-28 16:26:41 kencausey looking at the logs, the last successful hit was the nagios check oddly enough, 2 hits before google hit the stats page again 2011-02-28 16:26:56 kencausey I don't see anything suspicious like the last time 2011-02-28 16:29:41 kencausey there is a debugger open on a send of #bottomContext to UndefinedObject 2011-02-28 16:29:50 kencausey I can't interact with it 2011-02-28 16:30:54 kencausey it's in a call to Process>>terminate 2011-02-28 16:32:38 kencausey restarting it now 2011-02-28 16:34:09 kencausey website is back up
From the apache logs:
this is when it went down:
80.81.242.100 - - [28/Feb/2011:21:41:03 +0000] "GET /stats.html?view=main&year=1684&month=8 HTTP/1.1" 200 24288 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www. google.com/bot.html)" 173.255.225.4 - - [28/Feb/2011:21:42:13 +0000] "GET /favicon.ico HTTP/1.1" 200 1406 "-" "Safari/6533.19.4 CFNetwork/454.11.5 Darwin/10.5.0 (i386) (MacBook3%2C1)" 89.212.16.244 - - [28/Feb/2011:21:42:18 +0000] "GET /ping.html HTTP/1.1" 200 - "-" "check_http/v1.4.14 (nagios-plugins 1.4.14)" 38.99.97.225 - - [28/Feb/2011:21:42:49 +0000] "GET /Smalltalk/ HTTP/1.1" 502 399 "-" "Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)" 67.195.112.235 - - [28/Feb/2011:21:43:15 +0000] "GET /Merchandise/?version=3 HTTP/1.0" 502 403 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ ysearch/slurp)"
We have a googlebot hit to the stats page (relevant?), an irrelevant favicon request, a successful nagios ping which is I assume Janko's and not relevant, then hits start failing. Before that I see nothing suspicious and no flood of requests.
Ken
-------- Original Message -------- Subject: ** PROBLEM Service Alert: squeak box2/Squeak website is CRITICAL ** From: nagios@mivsek.eranova.si (User for Nagios) Date: Mon, February 28, 2011 3:49 pm To: ken@kencausey.com
***** Nagios *****
Notification Type: PROBLEM
Service: Squeak website Host: squeak box2 Address: 85.10.195.197 State: CRITICAL
Date/Time: Mon Feb 28 22:49:47 CET 2011
Additional Info:
CRITICAL - Socket timeout after 10 seconds
Hi Ken,
I checked too and see no website process active, so I restarted it and now I'm connected with VNC to the image.
Today seems that image crashed, but snapshoted correctly at 9pm GMT last time.
Can we see some vm dump somewhere?
Best regards Janko
On 28. 02. 2011 23:47, Ken Causey wrote:
After I received this notice I checked and the website process had the CPU pegged with normal memory usage. I tried to connect with VNC and got connected but the image was locked up. There was a debugger open and I took a screenshot which can be found at
http://users.squeak.org/~kencausey/website_locked.png
I chatted in the IRC channel as I was fiddling with it:
2011-02-28 16:24:21 kencausey JankoMivsek: website process is flipping out again 2011-02-28 16:26:14 kencausey the memory usage is normal this time, it just has the CPU pegged 2011-02-28 16:26:41 kencausey looking at the logs, the last successful hit was the nagios check oddly enough, 2 hits before google hit the stats page again 2011-02-28 16:26:56 kencausey I don't see anything suspicious like the last time 2011-02-28 16:29:41 kencausey there is a debugger open on a send of #bottomContext to UndefinedObject 2011-02-28 16:29:50 kencausey I can't interact with it 2011-02-28 16:30:54 kencausey it's in a call to Process>>terminate 2011-02-28 16:32:38 kencausey restarting it now 2011-02-28 16:34:09 kencausey website is back up
From the apache logs:
this is when it went down:
80.81.242.100 - - [28/Feb/2011:21:41:03 +0000] "GET /stats.html?view=main&year=1684&month=8 HTTP/1.1" 200 24288 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www. google.com/bot.html)" 173.255.225.4 - - [28/Feb/2011:21:42:13 +0000] "GET /favicon.ico HTTP/1.1" 200 1406 "-" "Safari/6533.19.4 CFNetwork/454.11.5 Darwin/10.5.0 (i386) (MacBook3%2C1)" 89.212.16.244 - - [28/Feb/2011:21:42:18 +0000] "GET /ping.html HTTP/1.1" 200 - "-" "check_http/v1.4.14 (nagios-plugins 1.4.14)" 38.99.97.225 - - [28/Feb/2011:21:42:49 +0000] "GET /Smalltalk/ HTTP/1.1" 502 399 "-" "Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)" 67.195.112.235 - - [28/Feb/2011:21:43:15 +0000] "GET /Merchandise/?version=3 HTTP/1.0" 502 403 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ ysearch/slurp)"
We have a googlebot hit to the stats page (relevant?), an irrelevant favicon request, a successful nagios ping which is I assume Janko's and not relevant, then hits start failing. Before that I see nothing suspicious and no flood of requests.
Ken
-------- Original Message -------- Subject: ** PROBLEM Service Alert: squeak box2/Squeak website is CRITICAL ** From: nagios@mivsek.eranova.si (User for Nagios) Date: Mon, February 28, 2011 3:49 pm To: ken@kencausey.com
***** Nagios *****
Notification Type: PROBLEM
Service: Squeak website Host: squeak box2 Address: 85.10.195.197 State: CRITICAL
Date/Time: Mon Feb 28 22:49:47 CET 2011
Additional Info:
CRITICAL - Socket timeout after 10 seconds
box-admins@lists.squeakfoundation.org