[squeak-dev] trunk process resilience

Chris Muller asqueaker at gmail.com
Mon Nov 11 16:38:42 UTC 2013


Thanks for the great discussion Dave.

> I like the idea of building some resilience into the SqueakSource servers.
> I also like the idea of using Magma to support this, because I know that
> Magma has been used to address similar issues on much larger scale systems.

We don't need to use Magma at all to accomplish the redundancy I'm proposing.

The work I did to use a Magma backend for SqueakSource on box4 is
solely for reliable persistence and to support the history function in
the IDE.  Nothing more.  Its HA function is not being used at all, in
fact Magma is being used by the webserver in "local" (direct-connect,
single-user) mode.

> I do have some concerns of a non-technical nature:
>
> 1) From an operational point of view, we need to keep our systems as simple
> as possible. There are very few people supporting the servers, and their
> availability comes and goes over time, so we need to keep things simple
> enough that any box-admins person can always figure out how to get things
> running even if the expert is not available.

Agreed.

> 2) We need to be careful not to add more failure modes than we remove. This
> is a painfully common mistake, in which people add high availability features
> to an existing system with the result that new failure modes are introduced
> that turn out to be worse than the failure modes that they were attempting
> to mitigate.

Agreed.

> As an example, I would point to the recent downtime on SmalltalkHub
> (see the excellent recap provided by Philippe Marschall at
> https://github.com/blog/1346-network-problems-last-friday). The system
> had availability problems for an extended period of time, and the cause
> was a (human error induced) failure in some redundant networking gear.
> The high availability networking introduced additional failure modes, and
> the combination of human error and system complexity reduced the resilience
> of the system as a whole.

You said SmalltalkHub but the link was about GitHub (an interesting
story, nonetheless).  Is Philippe Marschall working at GitHub?

> This is meant only as a cautionary note. I really *do* like the idea of
> building in some redundancy, and I think that the work you (Chris) have
> done with box4.squeak.org might be a good way to do it.
>
>>
>> So, I guess I'm proposing that we have some elements in the image "aware"
>> of a second trunk.  But before wrangling out exactly what form that
>> awareness would take, what do you think so far?
>>
>
> We should keep any changes in the image to a minimum, but the general idea
> sounds good to me.

I'll submit a proposal to the Inbox which will clarify exactly what
I'm proposing.

Thanks.


More information about the Squeak-dev mailing list