After I received this notice I checked and the website process had the CPU pegged with normal memory usage. I tried to connect with VNC and got connected but the image was locked up. There was a debugger open and I took a screenshot which can be found at
http://users.squeak.org/~kencausey/website_locked.png
I chatted in the IRC channel as I was fiddling with it:
2011-02-28 16:24:21 kencausey JankoMivsek: website process is flipping out again 2011-02-28 16:26:14 kencausey the memory usage is normal this time, it just has the CPU pegged 2011-02-28 16:26:41 kencausey looking at the logs, the last successful hit was the nagios check oddly enough, 2 hits before google hit the stats page again 2011-02-28 16:26:56 kencausey I don't see anything suspicious like the last time 2011-02-28 16:29:41 kencausey there is a debugger open on a send of #bottomContext to UndefinedObject 2011-02-28 16:29:50 kencausey I can't interact with it 2011-02-28 16:30:54 kencausey it's in a call to Process>>terminate 2011-02-28 16:32:38 kencausey restarting it now 2011-02-28 16:34:09 kencausey website is back up
From the apache logs:
this is when it went down:
80.81.242.100 - - [28/Feb/2011:21:41:03 +0000] "GET /stats.html?view=main&year=1684&month=8 HTTP/1.1" 200 24288 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www. google.com/bot.html)" 173.255.225.4 - - [28/Feb/2011:21:42:13 +0000] "GET /favicon.ico HTTP/1.1" 200 1406 "-" "Safari/6533.19.4 CFNetwork/454.11.5 Darwin/10.5.0 (i386) (MacBook3%2C1)" 89.212.16.244 - - [28/Feb/2011:21:42:18 +0000] "GET /ping.html HTTP/1.1" 200 - "-" "check_http/v1.4.14 (nagios-plugins 1.4.14)" 38.99.97.225 - - [28/Feb/2011:21:42:49 +0000] "GET /Smalltalk/ HTTP/1.1" 502 399 "-" "Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/)" 67.195.112.235 - - [28/Feb/2011:21:43:15 +0000] "GET /Merchandise/?version=3 HTTP/1.0" 502 403 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ ysearch/slurp)"
We have a googlebot hit to the stats page (relevant?), an irrelevant favicon request, a successful nagios ping which is I assume Janko's and not relevant, then hits start failing. Before that I see nothing suspicious and no flood of requests.
Ken
-------- Original Message -------- Subject: ** PROBLEM Service Alert: squeak box2/Squeak website is CRITICAL ** From: nagios@mivsek.eranova.si (User for Nagios) Date: Mon, February 28, 2011 3:49 pm To: ken@kencausey.com
***** Nagios *****
Notification Type: PROBLEM
Service: Squeak website Host: squeak box2 Address: 85.10.195.197 State: CRITICAL
Date/Time: Mon Feb 28 22:49:47 CET 2011
Additional Info:
CRITICAL - Socket timeout after 10 seconds