box3.squeak.org off line - HELP neeeded - Box-Admins

List overview All Threads
Download

newer

box3.squeak.org off line - HELP neeeded

older

Username 'lewis' changed to...

The technical side of the status...

David T. Lewis

6 Oct 2013 6 Oct '13

4:15 p.m.

I cannot ping box3.squeak.org, which is our build.squeak.org and squeaksource.com server. The box4.squeak.org box is responding, but box3 times out on a ping.

Can someone please check it and restart if needed? If rebooted, the squeaksource.com service will require a manual restart as per ~ssdotcom/README.

I have limited connectivity today but will try to keep in touch.

A root cause analyis will be in order, and since squeaksource.com is the most recent change on box3.squeak.org, that will be an obvious suspect.

TIA,

Dave

Show replies by date

Tobias Pape

6 Oct 6 Oct

4:31 p.m.

Am 06.10.2013 um 16:15 schrieb "David T. Lewis" lewis@mail.msen.com:

...

I cannot ping box3.squeak.org, which is our build.squeak.org and squeaksource.com server. The box4.squeak.org box is responding, but box3 times out on a ping.

Can someone please check it and restart if needed? If rebooted, the squeaksource.com service will require a manual restart as per ~ssdotcom/README.

I have limited connectivity today but will try to keep in touch.

A root cause analyis will be in order, and since squeaksource.com is the most recent change on box3.squeak.org, that will be an obvious suspect.

Aparently, box3/Squeaksource is up and running. I cannot see anythin suspicous on the box (wich is clearly reachable via ssh…)

Best -Tobias

David T. Lewis

4:39 p.m.

On Sun, Oct 06, 2013 at 04:31:05PM +0200, Tobias Pape wrote:

...

Am 06.10.2013 um 16:15 schrieb "David T. Lewis" lewis@mail.msen.com:

...
I cannot ping box3.squeak.org, which is our build.squeak.org and squeaksource.com server. The box4.squeak.org box is responding, but box3 times out on a ping.

Can someone please check it and restart if needed? If rebooted, the squeaksource.com service will require a manual restart as per ~ssdotcom/README.

I have limited connectivity today but will try to keep in touch.

A root cause analyis will be in order, and since squeaksource.com is the most recent change on box3.squeak.org, that will be an obvious suspect.

Aparently, box3/Squeaksource is up and running. I cannot see anythin suspicous on the box (wich is clearly reachable via ssh?)

Best -Tobias

Thanks for the quick reply!

It is on line again for me now as well, and I can see that the squeaksource image has been running without interruption for a couple of days, so the box3 server did not go down.

The server was definitely not responding to my pings for a period of at least several minutes (possibly much longer), and the box4 server was responding at that time, so a network problem seems unlikely.

Still concerned but feeling much better now,

Dave

Tobias Pape

4:52 p.m.

Am 06.10.2013 um 16:39 schrieb "David T. Lewis" lewis@mail.msen.com:

...

On Sun, Oct 06, 2013 at 04:31:05PM +0200, Tobias Pape wrote:

...
Am 06.10.2013 um 16:15 schrieb "David T. Lewis" lewis@mail.msen.com:

...
I cannot ping box3.squeak.org, which is our build.squeak.org and squeaksource.com server. The box4.squeak.org box is responding, but box3 times out on a ping.

Can someone please check it and restart if needed? If rebooted, the squeaksource.com service will require a manual restart as per ~ssdotcom/README.

I have limited connectivity today but will try to keep in touch.

A root cause analyis will be in order, and since squeaksource.com is the most recent change on box3.squeak.org, that will be an obvious suspect.

Aparently, box3/Squeaksource is up and running. I cannot see anythin suspicous on the box (wich is clearly reachable via ssh?)

Best -Tobias

Thanks for the quick reply!

It is on line again for me now as well, and I can see that the squeaksource image has been running without interruption for a couple of days, so the box3 server did not go down.

The server was definitely not responding to my pings for a period of at least several minutes (possibly much longer), and the box4 server was responding at that time, so a network problem seems unlikely.

Still concerned but feeling much better now,

So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive…

Best -Tobias

David T. Lewis

5:18 p.m.

On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...

Am 06.10.2013 um 16:39 schrieb "David T. Lewis" lewis@mail.msen.com:

...
On Sun, Oct 06, 2013 at 04:31:05PM +0200, Tobias Pape wrote:

...
Am 06.10.2013 um 16:15 schrieb "David T. Lewis" lewis@mail.msen.com:

...
I cannot ping box3.squeak.org, which is our build.squeak.org and squeaksource.com server. The box4.squeak.org box is responding, but box3 times out on a ping.

Can someone please check it and restart if needed? If rebooted, the squeaksource.com service will require a manual restart as per ~ssdotcom/README.

I have limited connectivity today but will try to keep in touch.

A root cause analyis will be in order, and since squeaksource.com is the most recent change on box3.squeak.org, that will be an obvious suspect.

Aparently, box3/Squeaksource is up and running. I cannot see anythin suspicous on the box (wich is clearly reachable via ssh?)

Best -Tobias

Thanks for the quick reply!

It is on line again for me now as well, and I can see that the squeaksource image has been running without interruption for a couple of days, so the box3 server did not go down.

The server was definitely not responding to my pings for a period of at least several minutes (possibly much longer), and the box4 server was responding at that time, so a network problem seems unlikely.

Still concerned but feeling much better now,

So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

Dave

David T. Lewis

8 p.m.

On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...

On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...
So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

Dave

Frank Shearar

8 Oct 8 Oct

5:33 p.m.

On 6 October 2013 19:00, David T. Lewis lewis@mail.msen.com wrote:

...

On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...
On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...
So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones & I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

frank

...

Dave

David T. Lewis

6:48 p.m.

On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:

...

On 6 October 2013 19:00, David T. Lewis lewis@mail.msen.com wrote:

...
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...
On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...
So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones & I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)

Dave

Frank Shearar

7:22 p.m.

On 8 October 2013 17:48, David T. Lewis lewis@mail.msen.com wrote:

...

On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:

...
On 6 October 2013 19:00, David T. Lewis lewis@mail.msen.com wrote:

...
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...
On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...
So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones & I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)

Hm, OK, so build.squeak.org could be unavailable for a different reason!

frank

...

Dave

David T. Lewis

7:30 p.m.

On Tue, Oct 08, 2013 at 06:22:06PM +0100, Frank Shearar wrote:

...

On 8 October 2013 17:48, David T. Lewis lewis@mail.msen.com wrote:

...
On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:

...
On 6 October 2013 19:00, David T. Lewis lewis@mail.msen.com wrote:

...
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...
On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...
So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones & I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)

Hm, OK, so build.squeak.org could be unavailable for a different reason!

The entire box3.squeak.org box was unresponsive for a period of time, at least several minutes but possibly longer. By "unresponsive" I mean that it did not even respond to a ping. We do *not* know the root cause, other than the observation that the system had apparently experienced heavy load in that time frame.

Dave

Ken Causey

7:37 p.m.

oom_killer

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12929 ssdotcom 20 0 1028m 957m 784 R 99.9 95.0 12:05.45 /usr/local/lib/squeak/4.10.5-2619/squeakvm -vm-display-null squeaksource.2.image

Note the 6th field.

Ken

On 10/08/2013 12:22 PM, Frank Shearar wrote:

...

On 8 October 2013 17:48, David T. Lewislewis@mail.msen.com wrote:

...
On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:

...
On 6 October 2013 19:00, David T. Lewislewis@mail.msen.com wrote:

...
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...
On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote:

...
So, uptime said: root@box3-squeak:/home/ssdotcom# uptime 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, 8.52

And the _last two_ numbers are concerning. Basically, the server was overloaded. the Squeak vm uses about a gig of virtual memory (really?) and seems to compete with the jenkins running on the server. htop says, jenkins uses 25% of the systems memory while Squeak uses 19% (both of which I deem high). So in the event of some jenkins jobs firing off and Squeaksource answering some requests, the server might become un-responsive?

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones& I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)

Hm, OK, so build.squeak.org could be unavailable for a different reason!

frank

...
Dave

David T. Lewis

8:08 p.m.

Aha. That would do it for sure. So something is going on in the squeaksource image that is using a *lot* of object memory for some period of time. The use of 957m resident memory would very likely be enough to cause the symptoms that we saw.

Do you know if we see any similar pattern of memory usage on the source.squeak.org server? I'm already convinced that the squeaksource.com image badly needs to be updated to the same level of Squeak/Seaside/SqueakSource as our source.squeak.org server (due to socket leak problems if nothing else).

I also recall that squeaksource.com on the SCG server had horrible performance problems whenever we tried to commit a large MCZ to the VMMaker repository, and I always assumed that it was memory related in some way on another.

Thanks a lot Ken,

Dave

On Tue, Oct 08, 2013 at 12:37:17PM -0500, Ken Causey wrote:

...

oom_killer

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12929 ssdotcom 20 0 1028m 957m 784 R 99.9 95.0 12:05.45 /usr/local/lib/squeak/4.10.5-2619/squeakvm -vm-display-null squeaksource.2.image

Note the 6th field.

Ken

On 10/08/2013 12:22 PM, Frank Shearar wrote:

...
On 8 October 2013 17:48, David T. Lewislewis@mail.msen.com wrote:

...
On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:

...
On 6 October 2013 19:00, David T. Lewislewis@mail.msen.com wrote:

...
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote:

...
On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote: > >So, uptime said: >root@box3-squeak:/home/ssdotcom# uptime > 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, > 8.52 > >And the _last two_ numbers are concerning. Basically, the server was >overloaded. >the Squeak vm uses about a gig of virtual memory (really?) and seems >to compete with >the jenkins running on the server. htop says, jenkins uses 25% of the >systems memory >while Squeak uses 19% (both of which I deem high). > So in the event of some jenkins jobs firing off and Squeaksource > answering some requests, >the server might become un-responsive? >

Something like that I think. I'm not sure what was generating the load, although there is no question that adding squeaksource to box3 adds a significant resource demand above that of the Jenkins jobs.

Allocating a big address space (1G) is normal for the VM, and in this case the image is actually using a bit under 200MB, which is 20% of the system memory. If there is some combination of squeaksource and jenkins activity that pushes the total demand to the point of requiring swapping, then it's possible that this would make the system unresponsive as I was seeing.

A number of the Jenkins jobs run squeak VMs in addition to the Java stuff, so some combination of these might add up to a problem.

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones& I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)

Hm, OK, so build.squeak.org could be unavailable for a different reason!

frank

...
Dave

Ken Causey

9:12 p.m.

The answer is yes but of course source.squeak.org is so so much smaller in terms of the data it has to handle and the traffic level that its swings in memory usage are rarely a significant problem.

I should have said something more when you started discussing this, and I apologize for my level of silence. I'm concerned that given the amount of trouble the original owners had with SqueakSource.com, I don't see how we can expect to do better, particularly with a virtual server with only 1GB of RAM allocated to it. Setting it to read-only, and I'm not sure but perhaps that is how it is set now, is of course going to reduce the load. But how much? It's not an easy question to answer.

Ken

On 10/08/2013 01:08 PM, David T. Lewis wrote:

...

Aha. That would do it for sure. So something is going on in the squeaksource image that is using a *lot* of object memory for some period of time. The use of 957m resident memory would very likely be enough to cause the symptoms that we saw.

Do you know if we see any similar pattern of memory usage on the source.squeak.org server? I'm already convinced that the squeaksource.com image badly needs to be updated to the same level of Squeak/Seaside/SqueakSource as our source.squeak.org server (due to socket leak problems if nothing else).

I also recall that squeaksource.com on the SCG server had horrible performance problems whenever we tried to commit a large MCZ to the VMMaker repository, and I always assumed that it was memory related in some way on another.

Thanks a lot Ken,

Dave

On Tue, Oct 08, 2013 at 12:37:17PM -0500, Ken Causey wrote:

...
oom_killer

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12929 ssdotcom 20 0 1028m 957m 784 R 99.9 95.0 12:05.45 /usr/local/lib/squeak/4.10.5-2619/squeakvm -vm-display-null squeaksource.2.image

Note the 6th field.

Ken

On 10/08/2013 12:22 PM, Frank Shearar wrote:

...
On 8 October 2013 17:48, David T. Lewislewis@mail.msen.com wrote:

...
On Tue, Oct 08, 2013 at 04:33:28PM +0100, Frank Shearar wrote:

...
On 6 October 2013 19:00, David T. Lewislewis@mail.msen.com wrote:

...
On Sun, Oct 06, 2013 at 11:18:25AM -0400, David T. Lewis wrote: > On Sun, Oct 06, 2013 at 04:52:21PM +0200, Tobias Pape wrote: >> >> So, uptime said: >> root@box3-squeak:/home/ssdotcom# uptime >> 16:32:49 up 166 days, 22 min, 1 user, load average: 0.70, 4.74, >> 8.52 >> >> And the _last two_ numbers are concerning. Basically, the server was >> overloaded. >> the Squeak vm uses about a gig of virtual memory (really?) and seems >> to compete with >> the jenkins running on the server. htop says, jenkins uses 25% of the >> systems memory >> while Squeak uses 19% (both of which I deem high). >> So in the event of some jenkins jobs firing off and Squeaksource >> answering some requests, >> the server might become un-responsive? >> > > Something like that I think. I'm not sure what was generating the > load, although > there is no question that adding squeaksource to box3 adds a > significant resource > demand above that of the Jenkins jobs. > > Allocating a big address space (1G) is normal for the VM, and in this > case the > image is actually using a bit under 200MB, which is 20% of the system > memory. > If there is some combination of squeaksource and jenkins activity that > pushes > the total demand to the point of requiring swapping, then it's > possible that > this would make the system unresponsive as I was seeing. > > A number of the Jenkins jobs run squeak VMs in addition to the Java > stuff, > so some combination of these might add up to a problem. >

I am now running top every 30 seconds for the next 24 hours, with output directed to ~ssdotcom/tmp/top.out. Possibly this will show us something interesting.

If that's still running, it's probably saying "ow! ow! stop it!" right now. If Tony Garnock-Jones& I could figure out why jobs are failing on his slaves, I'd suggest moving builds off the box entirely. I'll probably turn my old laptop into a build slave... once I can get it up & running again. That too will help with keeping work off the box.

No worries, I only ran it for a 24 hour period. I saw occasional load increases, but nothing like the "load average: 0.70, 4.74, 8.52" that Tobias spotted right after the outage. I think we just need to keep our eyes open for problems in case it comes back ... usually if a thing can fail once, it will fail again eventually ;-)

Hm, OK, so build.squeak.org could be unavailable for a different reason!

frank

...
Dave

David T. Lewis

9 Oct 9 Oct

1:55 a.m.

On Tue, Oct 08, 2013 at 12:12:12PM -0500, Ken Causey wrote:

...

On Tue, Oct 08, 2013 at 02:08:01PM -0400, David T. Lewis wrote:

...
Aha. That would do it for sure. So something is going on in the squeaksource image that is using a *lot* of object memory for some period of time. The use of 957m resident memory would very likely be enough to cause the symptoms that we saw.

Do you know if we see any similar pattern of memory usage on the source.squeak.org server? I'm already convinced that the squeaksource.com image badly needs to be updated to the same level of Squeak/Seaside/SqueakSource as our source.squeak.org server (due to socket leak problems if nothing else).

I also recall that squeaksource.com on the SCG server had horrible performance problems whenever we tried to commit a large MCZ to the VMMaker repository, and I always assumed that it was memory related in some way on another.

The answer is yes but of course source.squeak.org is so so much smaller in terms of the data it has to handle and the traffic level that its swings in memory usage are rarely a significant problem.

I should have said something more when you started discussing this, and I apologize for my level of silence. I'm concerned that given the amount of trouble the original owners had with SqueakSource.com, I don't see how we can expect to do better, particularly with a virtual server with only 1GB of RAM allocated to it. Setting it to read-only, and I'm not sure but perhaps that is how it is set now, is of course going to reduce the load. But how much? It's not an easy question to answer.

I am cautiously optimistic. We used to have the VMMaker repository hosted on squeaksource.com, and the performance and reliability were so horrible that we moved it to source.squeak.org. Wonder of wonders, all the problems went away and the VMMaker repository has been trouble-free ever since.

That tells me that upgrading the squeaksource.com image to the same level of code that we are already using for source.squeak.org has a high probability of improving things. I plan to do that as soon as possible, and I'm sure that Tobias and Bert will help as needed.

Also, based on my personal experience as a user of the VMMaker repo on squeaksource.com, I always suspected that the performance and reliability problems were related to memory usage on the host. That's because I could watch VMMaker commits in the progress bar, and they would proceed slowly but normally about 2/3 of the way through the progress bar, then would then grind to a halt and either take a long time to complete, or time out and fail completely. That always looked to me like a server app running out of memory and starting to swap to disk. So I am not surprised that we are seeing a memory related problem with our copy of the squeaksource image now. Again, I suspect that the problem may magically be cured by updating the squeaksource.com image to the same level as source.squeak.org.

Anyhow, that's my story and I'm sticking to it until proven wrong ;-)

Dave

Chris Muller

2:54 a.m.

I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

I need to tell you -- I did experience a 1GB memory situation myself the other day! No idea how it could happen especially since I'm doubtful there was any significant level of activity.

It was strange, but I simply restarted it. Now I'll really keep my eyes open.

On Tue, Oct 8, 2013 at 6:55 PM, David T. Lewis lewis@mail.msen.com wrote:

...

On Tue, Oct 08, 2013 at 12:12:12PM -0500, Ken Causey wrote:

...
On Tue, Oct 08, 2013 at 02:08:01PM -0400, David T. Lewis wrote:

...
Aha. That would do it for sure. So something is going on in the squeaksource image that is using a *lot* of object memory for some period of time. The use of 957m resident memory would very likely be enough to cause the symptoms that we saw.

Do you know if we see any similar pattern of memory usage on the source.squeak.org server? I'm already convinced that the squeaksource.com image badly needs to be updated to the same level of Squeak/Seaside/SqueakSource as our source.squeak.org server (due to socket leak problems if nothing else).

I also recall that squeaksource.com on the SCG server had horrible performance problems whenever we tried to commit a large MCZ to the VMMaker repository, and I always assumed that it was memory related in some way on another.

The answer is yes but of course source.squeak.org is so so much smaller in terms of the data it has to handle and the traffic level that its swings in memory usage are rarely a significant problem.

I should have said something more when you started discussing this, and I apologize for my level of silence. I'm concerned that given the amount of trouble the original owners had with SqueakSource.com, I don't see how we can expect to do better, particularly with a virtual server with only 1GB of RAM allocated to it. Setting it to read-only, and I'm not sure but perhaps that is how it is set now, is of course going to reduce the load. But how much? It's not an easy question to answer.

I am cautiously optimistic. We used to have the VMMaker repository hosted on squeaksource.com, and the performance and reliability were so horrible that we moved it to source.squeak.org. Wonder of wonders, all the problems went away and the VMMaker repository has been trouble-free ever since.

That tells me that upgrading the squeaksource.com image to the same level of code that we are already using for source.squeak.org has a high probability of improving things. I plan to do that as soon as possible, and I'm sure that Tobias and Bert will help as needed.

Also, based on my personal experience as a user of the VMMaker repo on squeaksource.com, I always suspected that the performance and reliability problems were related to memory usage on the host. That's because I could watch VMMaker commits in the progress bar, and they would proceed slowly but normally about 2/3 of the way through the progress bar, then would then grind to a halt and either take a long time to complete, or time out and fail completely. That always looked to me like a server app running out of memory and starting to swap to disk. So I am not surprised that we are seeing a memory related problem with our copy of the squeaksource image now. Again, I suspect that the problem may magically be cured by updating the squeaksource.com image to the same level as source.squeak.org.

Anyhow, that's my story and I'm sticking to it until proven wrong ;-)

Dave

David T. Lewis

3:52 a.m.

On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:

...

I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

It might be all of the above. I'm thinking that the priority for the squeaksource.com image is to first get it up to date to the level of source.squeak.org, then at our leisure we can bring both of them together up to a the level that you are demonstarting with Cog and Magma. But the main thing is to get squeaksource.com stable, which is not yet the case.

...

I need to tell you -- I did experience a 1GB memory situation myself the other day! No idea how it could happen especially since I'm doubtful there was any significant level of activity.

It was strange, but I simply restarted it. Now I'll really keep my eyes open.

I'm not sure what causes those large memory usage blips, but at some point we will need to figure it out.

Meanwhile, I took a look at the running squeaksource.com image, which as Ken pointed out was using nearly 1GB of memory, and which oh by the way had crashed about 20 times in the last day or so with out-of-memory errors.

It turns out that the following snippet from the workspace that SCG provided took care of the problem for now:

" kill runaway processes " ProcessBrowser open. Process allInstances do: [ :each | each priority = 30 ifTrue: [ each terminate ] ].

So I'm guessing that some Seaside handler got wedged, presumably concurrent with an excessive memory usage condition, and it probably failed in some way that did not let the garbage collector clean up the mess.

I cleaned it up for the time being on our box3 server, and things seem to be back to "normal" (aka waiting for the next failure).

So my non-scientific, hand-waving summary is:

- Stuff happens that causes SqueakSource to temporarily allocate a lot of object memory, possibly in response to some request coming in through Seaside.

- Whenever the aforementioned stuff fails for some reason, the old squeaksource.com image craps out horribly and keeps references to whatever was going on at the time of the crap out.

- The newer source.squeak.org image probably is doing the same bad things with respect to using large gobs of object memory, but it probably fails in reasonable ways that free up object allocations and clean up socket handle references, etc.

Therefore: We should continue to manually monitor the squeaksource.com image on box3, and clean up whatever messes it causes as best we can. But with high priority, the next step is to update squeaksource.com to use the source.squeak.com image, and verify that this puts us into a stable state of affairs. If and when that is successfully accomplished, we should be able to update both images to take advantage of Magma and Cog, etc.

Dave

Chris Muller

5:23 p.m.

...

On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:

...
I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

It might be all of the above. I'm thinking that the priority for the squeaksource.com image is to first get it up to date to the level of source.squeak.org, then at our leisure we can bring both of them together up to a the level that you are demonstarting with Cog and Magma. But the main thing is to get squeaksource.com stable, which is not yet the case.

source.squeak.org is getting slow too, and it runs on an old image. The work I've done to bring it to a trunk image under Cog is based on the source.squeak.org image, and really not that many changes.

...

It turns out that the following snippet from the workspace that SCG provided took care of the problem for now:
    " kill runaway processes "
    ProcessBrowser open.
    Process allInstances do: [ :each |
            each priority = 30 ifTrue: [
                    each terminate ] ].

OMG! No wonder SS has so many problems if it is so unloved that someone would arbitrarily kill processes based on their priority!

...

So I'm guessing that some Seaside handler got wedged, presumably concurrent with an excessive memory usage condition, and it probably failed in some way that did not let the garbage collector clean up the mess.

SS forks the email out to Project subscribers at priority 30. See SSEMailSubscription>>#versionAdded:to:. I wonder whether that's what those were?

David T. Lewis

6:42 p.m.

On Wed, Oct 09, 2013 at 10:23:06AM -0500, Chris Muller wrote:

...

...
On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:

...
I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

It might be all of the above. I'm thinking that the priority for the squeaksource.com image is to first get it up to date to the level of source.squeak.org, then at our leisure we can bring both of them together up to a the level that you are demonstarting with Cog and Magma. But the main thing is to get squeaksource.com stable, which is not yet the case.

source.squeak.org is getting slow too, and it runs on an old image. The work I've done to bring it to a trunk image under Cog is based on the source.squeak.org image, and really not that many changes.

...
It turns out that the following snippet from the workspace that SCG provided took care of the problem for now:
    " kill runaway processes "
    ProcessBrowser open.
    Process allInstances do: [ :each |
            each priority = 30 ifTrue: [
                    each terminate ] ].
OMG! No wonder SS has so many problems if it is so unloved that someone would arbitrarily kill processes based on their priority!

That's just something I found in the workspace that came with the image. I figured it must be there for a reason, so I tried it and it "fixed" the problem.

You are right about one thing - SS was not getting much love. Even I, who knew absolutely nothing about SqueakSource before I volunteered to move it, was able to fix some of the worst problems. That's why I am confident that we can get it running reliably once we have given it a little bit of long-overdue attention.

...

...
So I'm guessing that some Seaside handler got wedged, presumably concurrent with an excessive memory usage condition, and it probably failed in some way that did not let the garbage collector clean up the mess.

SS forks the email out to Project subscribers at priority 30. See SSEMailSubscription>>#versionAdded:to:. I wonder whether that's what those were?

Outbound mail notification from squeaksource.com is disabled, so that should not have been a concern.

Note, I still need help from someone with access to box2 to enable mail delivery. I don't want to install an entire mail system on box3 just to handle notifications from squeaksource on box3, so I expect that configuring box2 to accept smtp from box3 would be the right thing to do.

Dave

Chris Muller

9:06 p.m.

On Wed, Oct 9, 2013 at 11:42 AM, David T. Lewis lewis@mail.msen.com wrote:

...

On Wed, Oct 09, 2013 at 10:23:06AM -0500, Chris Muller wrote:

...
...
On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:

...
I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

It might be all of the above. I'm thinking that the priority for the squeaksource.com image is to first get it up to date to the level of source.squeak.org, then at our leisure we can bring both of them together up to a the level that you are demonstarting with Cog and Magma. But the main thing is to get squeaksource.com stable, which is not yet the case.

source.squeak.org is getting slow too, and it runs on an old image. The work I've done to bring it to a trunk image under Cog is based on the source.squeak.org image, and really not that many changes.

...
It turns out that the following snippet from the workspace that SCG provided took care of the problem for now:
    " kill runaway processes "
    ProcessBrowser open.
    Process allInstances do: [ :each |
            each priority = 30 ifTrue: [
                    each terminate ] ].
OMG! No wonder SS has so many problems if it is so unloved that someone would arbitrarily kill processes based on their priority!
That's just something I found in the workspace that came with the image. I figured it must be there for a reason, so I tried it and it "fixed" the problem.

You are right about one thing - SS was not getting much love. Even I, who knew absolutely nothing about SqueakSource before I volunteered to move it, was able to fix some of the worst problems. That's why I am confident that we can get it running reliably once we have given it a little bit of long-overdue attention.

...
...
So I'm guessing that some Seaside handler got wedged, presumably concurrent with an excessive memory usage condition, and it probably failed in some way that did not let the garbage collector clean up the mess.

SS forks the email out to Project subscribers at priority 30. See SSEMailSubscription>>#versionAdded:to:. I wonder whether that's what those were?

Outbound mail notification from squeaksource.com is disabled, so that should not have been a concern.

Which switch are you referring to that is "disabled"? Did you check SSProject>>#versionAdded:? For the path of someone saving a version, it _unconditionally_ sends out mail to all subscribers, there is no single switch to disable that..

...

Note, I still need help from someone with access to box2 to enable mail delivery. I don't want to install an entire mail system on box3 just to handle notifications from squeaksource on box3, so I expect that configuring box2 to accept smtp from box3 would be the right thing to do.

Is SMTP support not already part of box3? Seems like a relatively basic service we're gonna want at some point because we want to be focusing on transitioning all responsibilities away from box2 toward box3 and box4 anyway.

David T. Lewis

10:07 p.m.

On Wed, Oct 09, 2013 at 02:06:09PM -0500, Chris Muller wrote:

...

On Wed, Oct 9, 2013 at 11:42 AM, David T. Lewis lewis@mail.msen.com wrote:

...
On Wed, Oct 09, 2013 at 10:23:06AM -0500, Chris Muller wrote:

...
...
On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:

...
I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

It might be all of the above. I'm thinking that the priority for the squeaksource.com image is to first get it up to date to the level of source.squeak.org, then at our leisure we can bring both of them together up to a the level that you are demonstarting with Cog and Magma. But the main thing is to get squeaksource.com stable, which is not yet the case.

source.squeak.org is getting slow too, and it runs on an old image. The work I've done to bring it to a trunk image under Cog is based on the source.squeak.org image, and really not that many changes.

...
It turns out that the following snippet from the workspace that SCG provided took care of the problem for now:
    " kill runaway processes "
    ProcessBrowser open.
    Process allInstances do: [ :each |
            each priority = 30 ifTrue: [
                    each terminate ] ].
OMG! No wonder SS has so many problems if it is so unloved that someone would arbitrarily kill processes based on their priority!
That's just something I found in the workspace that came with the image. I figured it must be there for a reason, so I tried it and it "fixed" the problem.

You are right about one thing - SS was not getting much love. Even I, who knew absolutely nothing about SqueakSource before I volunteered to move it, was able to fix some of the worst problems. That's why I am confident that we can get it running reliably once we have given it a little bit of long-overdue attention.

...
...
So I'm guessing that some Seaside handler got wedged, presumably concurrent with an excessive memory usage condition, and it probably failed in some way that did not let the garbage collector clean up the mess.

SS forks the email out to Project subscribers at priority 30. See SSEMailSubscription>>#versionAdded:to:. I wonder whether that's what those were?

Outbound mail notification from squeaksource.com is disabled, so that should not have been a concern.
Which switch are you referring to that is "disabled"? Did you check SSProject>>#versionAdded:? For the path of someone saving a version, it _unconditionally_ sends out mail to all subscribers, there is no single switch to disable that..

I found was some attribute that I set false, which seems to control it globally.

...

...
Note, I still need help from someone with access to box2 to enable mail delivery. I don't want to install an entire mail system on box3 just to handle notifications from squeaksource on box3, so I expect that configuring box2 to accept smtp from box3 would be the right thing to do.

Is SMTP support not already part of box3? Seems like a relatively basic service we're gonna want at some point because we want to be focusing on transitioning all responsibilities away from box2 toward box3 and box4 anyway.

No, mail is not installed on box3. I would be happy either way, all I want is a place to route the mail and currently that seems to be box2. I don't mind installing mail on box3 if that is the right thing to do, but I would need some guidance as to what package to install (exim?). I don't currently have access to box4 or box2, so I don't know our current setup and I don't want to take a guess and install the wrong package.

Dave

Levente Uzonyi

10 Oct 10 Oct

12:14 a.m.

On Wed, 9 Oct 2013, David T. Lewis wrote:

...

On Wed, Oct 09, 2013 at 02:06:09PM -0500, Chris Muller wrote:

...
On Wed, Oct 9, 2013 at 11:42 AM, David T. Lewis lewis@mail.msen.com wrote:

...
On Wed, Oct 09, 2013 at 10:23:06AM -0500, Chris Muller wrote:

...
...
On Tue, Oct 08, 2013 at 07:54:05PM -0500, Chris Muller wrote:

...
I just deployed an updated image for "new trunk" at box4.squeak.org:8888/trunk. This is the one that uses the Magma backend. I haven't done any benchmarks but the response time seems pretty snappy, even compared to regular source.squeak.org. Could be due to Cog, the Magma backend, or box4 being under less load than box3.

It might be all of the above. I'm thinking that the priority for the squeaksource.com image is to first get it up to date to the level of source.squeak.org, then at our leisure we can bring both of them together up to a the level that you are demonstarting with Cog and Magma. But the main thing is to get squeaksource.com stable, which is not yet the case.

source.squeak.org is getting slow too, and it runs on an old image. The work I've done to bring it to a trunk image under Cog is based on the source.squeak.org image, and really not that many changes.

...
It turns out that the following snippet from the workspace that SCG provided took care of the problem for now:
    " kill runaway processes "
    ProcessBrowser open.
    Process allInstances do: [ :each |
            each priority = 30 ifTrue: [
                    each terminate ] ].
OMG! No wonder SS has so many problems if it is so unloved that someone would arbitrarily kill processes based on their priority!
That's just something I found in the workspace that came with the image. I figured it must be there for a reason, so I tried it and it "fixed" the problem.

You are right about one thing - SS was not getting much love. Even I, who knew absolutely nothing about SqueakSource before I volunteered to move it, was able to fix some of the worst problems. That's why I am confident that we can get it running reliably once we have given it a little bit of long-overdue attention.

...
...
So I'm guessing that some Seaside handler got wedged, presumably concurrent with an excessive memory usage condition, and it probably failed in some way that did not let the garbage collector clean up the mess.

SS forks the email out to Project subscribers at priority 30. See SSEMailSubscription>>#versionAdded:to:. I wonder whether that's what those were?

Outbound mail notification from squeaksource.com is disabled, so that should not have been a concern.
Which switch are you referring to that is "disabled"? Did you check SSProject>>#versionAdded:? For the path of someone saving a version, it _unconditionally_ sends out mail to all subscribers, there is no single switch to disable that..
I found was some attribute that I set false, which seems to control it globally.

...
...
Note, I still need help from someone with access to box2 to enable mail delivery. I don't want to install an entire mail system on box3 just to handle notifications from squeaksource on box3, so I expect that configuring box2 to accept smtp from box3 would be the right thing to do.

Is SMTP support not already part of box3? Seems like a relatively basic service we're gonna want at some point because we want to be focusing on transitioning all responsibilities away from box2 toward box3 and box4 anyway.

No, mail is not installed on box3. I would be happy either way, all I want is a place to route the mail and currently that seems to be box2. I don't mind installing mail on box3 if that is the right thing to do, but I would need some guidance as to what package to install (exim?). I don't currently have access to box4 or box2, so I don't know our current setup and I don't want to take a guess and install the wrong package.

AFAIK the services of box2 - including the email services - are planned to be moved to box4, and not to box3.

Levente

P.S.: IMHO properly installing and configuring a mailing system is rather hard. A small mistake can result in sending lots of spam in a very short time.

...

Dave

David T. Lewis

12:29 a.m.

New subject: Mail delivery for box N (was: box3.squeak.org off line)

On Thu, Oct 10, 2013 at 12:14:56AM +0200, Levente Uzonyi wrote:

...

On Wed, 9 Oct 2013, David T. Lewis wrote:

...
On Wed, Oct 09, 2013 at 02:06:09PM -0500, Chris Muller wrote:

...
On Wed, Oct 9, 2013 at 11:42 AM, David T. Lewis lewis@mail.msen.com

Is SMTP support not already part of box3? Seems like a relatively basic service we're gonna want at some point because we want to be focusing on transitioning all responsibilities away from box2 toward box3 and box4 anyway.

No, mail is not installed on box3. I would be happy either way, all I want is a place to route the mail and currently that seems to be box2. I don't mind installing mail on box3 if that is the right thing to do, but I would need some guidance as to what package to install (exim?). I don't currently have access to box4 or box2, so I don't know our current setup and I don't want to take a guess and install the wrong package.

AFAIK the services of box2 - including the email services - are planned to be moved to box4, and not to box3.

Levente

P.S.: IMHO properly installing and configuring a mailing system is rather hard. A small mistake can result in sending lots of spam in a very short time.

I think you are right, and I would prefer not to attempt setting up a mailing system as part of my little "rescue squeaksource.com" project.

The existing mail service is on box2, so I will return to my original request: If someone with knowledge of the mail system, and root access to box2, could please allow mail to be delivered from box3, that would be helpful. Whenever box2 moves to box4 I will update squeaksource.com accordingly. This is not urgent (after all, most people will be glad to be rid of spam from squeaksource), but if we could get it done in the next week or two that would be great.

I would not mind helping to set up the replacement mail service on box4 at some point in the future, but I have neither the time nor the ability to take it on right now.

Dave

Ken Causey

4:03 p.m.

New subject: Mail delivery for box N

I have configured box2 to allow box3 to relay email. So you should be able to use box2.squeak.org as an SMTP host from box3.

Ken

Cees de Groot

1:12 a.m.

On 9 Oct, 2013, at 18:14 , Levente Uzonyi leves@elte.hu wrote:

...

P.S.: IMHO properly installing and configuring a mailing system is rather hard. A small mistake can result in sending lots of spam in a very short time.

Which is exactly why I advice against that these days. Unless you can muster the manpower to deal with the headaches of running an SMTP server in the current world, outsource your mail (plenty of places that will host mail for open source projects ;-))

Chris Muller

14 Oct 14 Oct

11:32 p.m.

Cees!! ' been a long time, nice to see your name again!

On Wed, Oct 9, 2013 at 6:12 PM, Cees de Groot cg@cdegroot.com wrote:

...

On 9 Oct, 2013, at 18:14 , Levente Uzonyi leves@elte.hu wrote:

...
P.S.: IMHO properly installing and configuring a mailing system is rather hard. A small mistake can result in sending lots of spam in a very short time.

Which is exactly why I advice against that these days. Unless you can muster the manpower to deal with the headaches of running an SMTP server in the current world, outsource your mail (plenty of places that will host mail for open source projects ;-))

David T. Lewis

15 Oct 15 Oct

1:51 a.m.

On Mon, Oct 14, 2013 at 04:32:48PM -0500, Chris Muller wrote:

...

Cees!! ' been a long time, nice to see your name again!

:-)

...

On Wed, Oct 9, 2013 at 6:12 PM, Cees de Groot cg@cdegroot.com wrote:

...
On 9 Oct, 2013, at 18:14 , Levente Uzonyi leves@elte.hu wrote:

...
P.S.: IMHO properly installing and configuring a mailing system is rather hard. A small mistake can result in sending lots of spam in a very short time.

Which is exactly why I advice against that these days. Unless you can muster the manpower to deal with the headaches of running an SMTP server in the current world, outsource your mail (plenty of places that will host mail for open source projects ;-))

David T. Lewis

10 Oct 10 Oct

2:54 p.m.

New subject: squeaksource.com mail disabled (was: box3.squeak.org off line)

On Wed, Oct 09, 2013 at 04:07:47PM -0400, David T. Lewis wrote:

...

On Wed, Oct 09, 2013 at 02:06:09PM -0500, Chris Muller wrote:

...
On Wed, Oct 9, 2013 at 11:42 AM, David T. Lewis lewis@mail.msen.com wrote:

...
Outbound mail notification from squeaksource.com is disabled, so that should not have been a concern.

Which switch are you referring to that is "disabled"? Did you check SSProject>>#versionAdded:? For the path of someone saving a version, it _unconditionally_ sends out mail to all subscribers, there is no single switch to disable that..

I found was some attribute that I set false, which seems to control it globally.

For the record - the mail delivery in the squeaksource.com image is controlled by SSEMailSubscription class>>enabled:

I have it set false now. If true or nil, the system will attempt to send mail.

Another global setting of interest is allowRegisterProject. This is an attribute in the SSRepository, and is currently set to false, as has been the case since earlier this year. If we want to permit new project creation, this setting can be changed. But I think that should be a board decision, for consideration only after we are confident that the system is stable and maintainable.

Dave

Chris Muller

5:48 p.m.

New subject: squeaksource.com mail disabled (was: box3.squeak.org off line)

On Thu, Oct 10, 2013 at 7:54 AM, David T. Lewis lewis@mail.msen.com wrote:

...

On Wed, Oct 09, 2013 at 04:07:47PM -0400, David T. Lewis wrote:

...
On Wed, Oct 09, 2013 at 02:06:09PM -0500, Chris Muller wrote:

...
On Wed, Oct 9, 2013 at 11:42 AM, David T. Lewis lewis@mail.msen.com wrote:

...
Outbound mail notification from squeaksource.com is disabled, so that should not have been a concern.

Which switch are you referring to that is "disabled"? Did you check SSProject>>#versionAdded:? For the path of someone saving a version, it _unconditionally_ sends out mail to all subscribers, there is no single switch to disable that..

I found was some attribute that I set false, which seems to control it globally.

For the record - the mail delivery in the squeaksource.com image is controlled by SSEMailSubscription class>>enabled:

Ok, thanks for the detail. The image which runs source.squeak.org does not have that flag at all.

So, it must be some other forked Process that accumulated..

...

I have it set false now. If true or nil, the system will attempt to send mail.

Another global setting of interest is allowRegisterProject. This is an attribute in the SSRepository, and is currently set to false, as has been the case since earlier this year. If we want to permit new project creation, this setting can be changed. But I think that should be a board decision, for consideration only after we are confident that the system is stable and maintainable.

The board has discussed it and the consensus was that not only should registerNewProject be disabled, but the whole thing should be made read-only -- not even new versions of existing projects. This allows the original SS to become be an "archive" and forcing living projects to migrate to SqueakSource3 or SmalltalkHub.

3855

Age (days ago)

3863

Last active (days ago)

box-admins@lists.squeakfoundation.org

27 comments

8 participants

tags (0)

participants (8)

Cees de Groot
Chris Muller
Chris Muller
David T. Lewis
Frank Shearar
Ken Causey
Levente Uzonyi
Tobias Pape