[squeak-dev] trunk process resilience

Fri Nov 8 02:38:22 UTC 2013

It's nice to read someone thinking about the issues faced by networked
programs.  In fact, the little experiment he does (under the heading
"A simple distributed system"), is exactly what one of Magma's HA
test-cases performs -- multiple clients perform rapid-fire commits as
fast as they can, counting upward, while the servers in the HA cluster
undergo various role changes due to HA events like arbitrarily killing
one of the servers with quitPrimitive.  It's quite a piece [1].

Thankfully, none of that applies to what's being proposed here, the
operations needed to achieve a mutual backup are idempotent -- simply
a package copy from remote to local using the existing
MCRepository>>#copyAllFrom:.  So, it uses existing error-handling too,
what could go wrong?

Under normal usage, the same person would not commit two different
UUID versions of a package, but with the same exact name, to each
repository.  But, even if they did, it's no different than when that
happens today between projects which, themselves, are simply different
repositories.

I've seen how fragile and unsustainable our source.squeak.org server
is.  I want to inform the community what I've done and solicit
pragmatic discussion on how we can get more out of it.

Thanks.

[1] -- http://wiki.squeak.org/squeak/6101

On Thu, Nov 7, 2013 at 3:33 PM, Frank Shearar <frank.shearar at gmail.com> wrote:
> On 7 November 2013 21:07, Chris Muller <ma.chris.m at gmail.com> wrote:
>> Lately we've had some problems with the SqueakSource server that supports
>> our vital trunk process.  Ken and I burned several hours on it this week.
>> The experience has caused me to consider an idea for improved continuity of
>> our trunk repository.
>>
>> Very simply, it's a second running copy of trunk (and inbox, et al).  Each
>> instance keeps itself up to date from the other.  If one goes down, the
>> other can be pointed to for updates AND commits to minimize disruption.
>>
>> Right now, we actually already have two trunks.  Now, I'm pleased to
>> announce that new-trunk running on box4.squeak.org is now a *full-copy* of
>> old-trunk on box2.  (Before it was only trunk, now it includes Inbox, Etoys,
>> etc.).  Using newer and better code and VM and also Magma, this copy of
>> trunk was originally brought up simply to provide MC method history directly
>> into the IDE, but now I can see its role being to improve trunk process
>> stability so that community development can be continuous until it
>> eventually becomes the defacto trunk (e.g., running source.squeak.org).
>>
>> There are other side-benefits too, like the ability to move or upgrade the
>> trunk without a service interruption.  We are assured to be ready to move to
>> a different server on a moments notice, e.g., break the link with Hetzner.
>>
>> So, I guess I'm proposing that we have some elements in the image "aware" of
>> a second trunk.  But before wrangling out exactly what form that awareness
>> would take, what do you think so far?
>
> I think before any person pitches in with any suggestion, that person
> should go read up on handling state in a distributed system. (Because
> having a second copy in a kind've active-active replication thing is
> exactly a distributed system.) It is _not_easy_. (And "Never go to sea
> with two chronometers; take one or three".) Here's a good starting
> point: http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network-partitions
>
> frank
>