Magma High Availability Shutdown a Node gets a timeout
Chris Muller
asqueaker at gmail.com
Wed Nov 25 18:39:33 UTC 2009
Hi,
> The answer to the other question I've asked:
> <snippet>
> Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if
> it is already a warmup for that primary location. Is it normal that he tries
> to do that again?
Note the guard in that method does check whether it is already a
warm-backup and, if so, avoids the #catchUp:to:, which is where the
bulk of the work would only be done if necessary to do so.
> Furthermore: if there is more than 3 nodes (say for
> instance 10 or more) each of them is again beWarmBackupFor the primary.
> </snippet>
> is still not clear to me. Is there a specific reason that this node2 again
> tries to beWarmupBackupFor: aPrimaryLocation even if it is already one? I've
The main reason is for uniformity of the implementation. See the
comment in #ensureCorrectNodeConfiguration. HA must handle a variety
of scenarios * a variety of pre-conditions * variety of timings of
possible events.. the idempotent property permits this relatively
uniform recovery process for all situations:
1) assess and verify a client complaint
2) adjust the Node object accordingly
3) call #ensureCorrectNodeConfiguration - the entire Node is righted
> noticed that it takes quite a lot of cpu time to establish that connection.
But yes, I do take your point, that even these 2-3 CPU seconds per
server related to creation of the adminSession, connection, and
assessment, after all that, that everything is a-ok, seems a bit
expensive when all you want to do is shut down one secondary.
Therefore, I've posted new packages with a special check for only
removing a secondary and, if so, skips step 3, above, of the recovery
process.
> Trying to break it in an other way, I still found another (possible) issue.
> I can still trigger a timeout during shutdown in the following way: if i for
> instance have a primary and 3 secondary servers; and i shutdown immediately
> after each other secondary 2 and 3. Then a async request from the primary to
> all the secondary servers is issued to do
> MagmaEnsureCorrectNodeConfiguration. So secondary 2 and 3 are at the same
> time shutting down & receiving a warmup request.
The new versions of Magma client and Magma server, which I've posted
to the "Magma tester" project of squeaksource, should address this
issue as well.
I welcome your attempt to break it with these new packages loaded.
Regards,
Chris
More information about the Magma
mailing list