I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com http://squeaksource.com/ is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
I restarted box2 and it appears to have made no difference to SS. I have to own that since I thought SS was on box3, when it is in fact on box2, I may have been introducing problems there. The sum of my actions on box3 were: to kill the ssdotcom image; and, to restart box3. Levente is logged into box3. All services on box3 are running. I think its best for me to back off now.
Chris
On Dec 21, 2014, at 2:31 PM, Chris Muller asqueaker@gmail.com wrote:
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
On 21.12.2014, at 20:31, Chris Muller asqueaker@gmail.com wrote:
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
how so?
~ $ host www.squeaksource.com www.squeaksource.com has address 173.246.101.237
tpape@box3-squeak:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:16:3e:e3:3a:ff inet addr:173.246.101.237 Bcast:173.246.103.255 Mask:255.255.252.0
tpape@box2:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:11:09:C6:91:51 inet addr:85.10.195.197 Bcast:85.10.195.223 Mask:255.255.255.224
==> SqueakSource is on box 3.
And that host is loaded with running qmail-remote processes that try to deliver bounces…
Best -Tobias
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
Hi All,
squeaksource.com is hosted on box3. source.squeak.org is on box2, but that's a different thing.
The problem with squeaksource.com is that it's leaking memory. The size of the image is close to 1GB, and it's saved like that on the disk. The service is too slow to respond due to swapping, because the server has 1GB memory total. I downloaded the image to my machine, and I'm investigating the image now.
Levente
On Sun, 21 Dec 2014, Tobias Pape wrote:
On 21.12.2014, at 20:31, Chris Muller asqueaker@gmail.com wrote:
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
how so?
~ $ host www.squeaksource.com www.squeaksource.com has address 173.246.101.237
tpape@box3-squeak:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:16:3e:e3:3a:ff inet addr:173.246.101.237 Bcast:173.246.103.255 Mask:255.255.252.0
tpape@box2:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:11:09:C6:91:51 inet addr:85.10.195.197 Bcast:85.10.195.223 Mask:255.255.255.224
==> SqueakSource is on box 3.
And that host is loaded with running qmail-remote processes that try to deliver bounces…
Best -Tobias
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
The problem was that an error occured, so Seaside was trying to send an email about it. But generating the stack trace for the email resulted in another error. This recursively triggered another attempt to create an email. The runaway process consumed 850MB of memory, and did not trigger the low space watcher, probably because it was a low priority (30) process (probably a Seaside request handler). Someone who's more familiar with Seaside should fix this issue. There may even be a patch for this in the official repository.
To fix the problem, I downloaded the image, terminated the runaway process, saved it as a new version, and uploaded it. Then I stopped the service, modified it to use the new image, and restarted it. I assume this has no side effects, but it would be great if someone more familiar with the image could take a look.
Levente
On Sun, 21 Dec 2014, Levente Uzonyi wrote:
Hi All,
squeaksource.com is hosted on box3. source.squeak.org is on box2, but that's a different thing.
The problem with squeaksource.com is that it's leaking memory. The size of the image is close to 1GB, and it's saved like that on the disk. The service is too slow to respond due to swapping, because the server has 1GB memory total. I downloaded the image to my machine, and I'm investigating the image now.
Levente
On Sun, 21 Dec 2014, Tobias Pape wrote:
On 21.12.2014, at 20:31, Chris Muller asqueaker@gmail.com wrote:
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
how so?
~ $ host www.squeaksource.com www.squeaksource.com has address 173.246.101.237
tpape@box3-squeak:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:16:3e:e3:3a:ff inet addr:173.246.101.237 Bcast:173.246.103.255 Mask:255.255.252.0
tpape@box2:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:11:09:C6:91:51 inet addr:85.10.195.197 Bcast:85.10.195.223 Mask:255.255.255.224
==> SqueakSource is on box 3.
And that host is loaded with running qmail-remote processes that try to deliver bounces…
Best -Tobias
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
On Sun, Dec 21, 2014 at 10:06:20PM +0100, Levente Uzonyi wrote:
The problem was that an error occured, so Seaside was trying to send an email about it. But generating the stack trace for the email resulted in another error. This recursively triggered another attempt to create an email. The runaway process consumed 850MB of memory, and did not trigger the low space watcher, probably because it was a low priority (30) process (probably a Seaside request handler). Someone who's more familiar with Seaside should fix this issue. There may even be a patch for this in the official repository.
To fix the problem, I downloaded the image, terminated the runaway process, saved it as a new version, and uploaded it. Then I stopped the service, modified it to use the new image, and restarted it.
Levente,
That was exactly the right thing to do. The fixed image that you have restarted under a new version name (squeaksource.4.image) is as up to date as it is possible to be, because the running image saves itself hourly. You also left the problem image (squeaksource.3.image) intact so that it can be downloaded be other people to investigate.
I assume this has no side effects, but it would be great if someone more familiar with the image could take a look.
You are right, this should have no bad side effects.
I am downloading the big squeaksource.3 image and I'll look and see if I can spot anything more about the cause (but I am not a Seaside expert either). I'll also gzip the old image file so it does not take up lot of space on box3.
I'm sorry I was not around to help out, but you handled the issue perfectly. Thank you very much!
Dave
On Sun, Dec 21, 2014 at 5:59 PM, David T. Lewis lewis@mail.msen.com wrote:
On Sun, Dec 21, 2014 at 10:06:20PM +0100, Levente Uzonyi wrote:
The problem was that an error occured, so Seaside was trying to send an email about it. But generating the stack trace for the email resulted in another error. This recursively triggered another attempt to create an email. The runaway process consumed 850MB of memory, and did not trigger the low space watcher, probably because it was a low priority (30) process (probably a Seaside request handler). Someone who's more familiar with Seaside should fix this issue. There may even be a patch for this in the official repository.
To fix the problem, I downloaded the image, terminated the runaway process, saved it as a new version, and uploaded it. Then I stopped the service, modified it to use the new image, and restarted it.
... snip ... I am downloading the big squeaksource.3 image and I'll look and see if I can spot anything more about the cause (but I am not a Seaside expert
Hi Dave, I encountered this problem with the squeaksource code October of 2013, when I was doing the Magma-backed source.squeak.org:
http://lists.squeakfoundation.org/pipermail/box-admins/2013-October/001544.h...
and fixed it in the versions at:
After that, you had saved squeaksource.com from oblivion, but did not (IIRC) incorporate the fixes and improvements I had made to the SS code in trunk. I _really think_ you should consider merging those versions into the version that runs SqueakSource.com -- as there are other fixes and improvements besidse this one.
On Mon, Dec 22, 2014 at 12:04:16PM -0600, Chris Muller wrote:
On Sun, Dec 21, 2014 at 5:59 PM, David T. Lewis lewis@mail.msen.com wrote:
On Sun, Dec 21, 2014 at 10:06:20PM +0100, Levente Uzonyi wrote:
The problem was that an error occured, so Seaside was trying to send an email about it. But generating the stack trace for the email resulted in another error. This recursively triggered another attempt to create an email. The runaway process consumed 850MB of memory, and did not trigger the low space watcher, probably because it was a low priority (30) process (probably a Seaside request handler). Someone who's more familiar with Seaside should fix this issue. There may even be a patch for this in the official repository.
To fix the problem, I downloaded the image, terminated the runaway process, saved it as a new version, and uploaded it. Then I stopped the service, modified it to use the new image, and restarted it.
... snip ... I am downloading the big squeaksource.3 image and I'll look and see if I can spot anything more about the cause (but I am not a Seaside expert
Hi Dave, I encountered this problem with the squeaksource code October of 2013, when I was doing the Magma-backed source.squeak.org:
http://lists.squeakfoundation.org/pipermail/box-admins/2013-October/001544.h...
and fixed it in the versions at:
After that, you had saved squeaksource.com from oblivion, but did not (IIRC) incorporate the fixes and improvements I had made to the SS code in trunk. I _really think_ you should consider merging those versions into the version that runs SqueakSource.com -- as there are other fixes and improvements besidse this one.
Thanks Chris.
Good idea. I can't look at it now, but I'll see if I can do as you suggest next week.
Dave
Sorry for the confusion. Do we have a recent backup of the image prior to the 1GB growth? My memory of the backup process is that it maintains about 5 generations spanning the last 5 days?
On Sun, Dec 21, 2014 at 2:27 PM, Levente Uzonyi leves@elte.hu wrote:
Hi All,
squeaksource.com is hosted on box3. source.squeak.org is on box2, but that's a different thing.
The problem with squeaksource.com is that it's leaking memory. The size of the image is close to 1GB, and it's saved like that on the disk. The service is too slow to respond due to swapping, because the server has 1GB memory total. I downloaded the image to my machine, and I'm investigating the image now.
Levente
On Sun, 21 Dec 2014, Tobias Pape wrote:
On 21.12.2014, at 20:31, Chris Muller asqueaker@gmail.com wrote:
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
how so?
~ $ host www.squeaksource.com www.squeaksource.com has address 173.246.101.237
tpape@box3-squeak:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:16:3e:e3:3a:ff inet addr:173.246.101.237 Bcast:173.246.103.255 Mask:255.255.252.0
tpape@box2:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:11:09:C6:91:51 inet addr:85.10.195.197 Bcast:85.10.195.223 Mask:255.255.255.224
==> SqueakSource is on box 3.
And that host is loaded with running qmail-remote processes that try to deliver bounces…
Best -Tobias
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
On Sun, 21 Dec 2014, Chris Muller wrote:
Sorry for the confusion. Do we have a recent backup of the image prior to the 1GB growth? My memory of the backup process is that it maintains about 5 generations spanning the last 5 days?
AFAIK such backup process only exists on box2. In theory Randal is making backups from box3.
Levente
On Sun, Dec 21, 2014 at 2:27 PM, Levente Uzonyi leves@elte.hu wrote:
Hi All,
squeaksource.com is hosted on box3. source.squeak.org is on box2, but that's a different thing.
The problem with squeaksource.com is that it's leaking memory. The size of the image is close to 1GB, and it's saved like that on the disk. The service is too slow to respond due to swapping, because the server has 1GB memory total. I downloaded the image to my machine, and I'm investigating the image now.
Levente
On Sun, 21 Dec 2014, Tobias Pape wrote:
On 21.12.2014, at 20:31, Chris Muller asqueaker@gmail.com wrote:
On Sun, Dec 21, 2014 at 1:07 PM, Chris Cunnington brasspen@gmail.com wrote:
I’ve been looking at box3 for about half an hour now. I’d say Dale is right and squeaksource.com is down. I logged in and killed the ssdotcom process. This did not solve the problem once it had restarted under daemontools. And I saw that Jenkins is also unreachable. I restarted the server and the problem again still persists. My options are exhausted. I would guess off the top of my head and with no evidence whatsoever that this is a DNS thing. I’ll have to leave this to somebody with better knowledge than mine.
squeaksource.com runs on box 2.
how so?
~ $ host www.squeaksource.com www.squeaksource.com has address 173.246.101.237
tpape@box3-squeak:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:16:3e:e3:3a:ff inet addr:173.246.101.237 Bcast:173.246.103.255 Mask:255.255.252.0
tpape@box2:~$ /sbin/ifconfig eth0 Link encap:Ethernet HWaddr 00:11:09:C6:91:51 inet addr:85.10.195.197 Bcast:85.10.195.223 Mask:255.255.255.224
==> SqueakSource is on box 3.
And that host is loaded with running qmail-remote processes that try to deliver bounces…
Best -Tobias
Chris
On Dec 21, 2014, at 1:21 PM, Dale Henrichs dale.henrichs@gemtalksystems.com wrote:
Looks like it might have gone down last night?
http://www.downforeveryoneorjustme.com/http://www.squeaksource.com/
Dale
On Sun, Dec 21, 2014 at 03:43:50PM -0600, Chris Muller wrote:
Sorry for the confusion. Do we have a recent backup of the image prior to the 1GB growth? My memory of the backup process is that it maintains about 5 generations spanning the last 5 days?
No, the squeaksource.com image saves itself every hour. There are no other versions saved by the backup process in the image. I do not know if there are actual backups available for the box3 server itself.
I do occasionally make a manual backup of the image, and keep it in the directory ~ssdotcom/SqueakSource/BACKUPS. This serves only to keep an ad-hoc snapshot every once in a while, e.g. if some change was being made to the image overall. These backups are not intended for failure recovery, although it may be worth knowing that they exist.
In this case, the hourly backup worked as intended. The saved image was functional, and Levente fixed the problem off line. The resulting fixed image lost no data.
Dave
box-admins@lists.squeakfoundation.org