Magma High Availability Shutdown a Node gets a timeout
bart.gauquie at gmail.com
Tue Nov 24 07:53:30 UTC 2009
Thanks for your quick reply.
I've tested the packages and now it works as expected. If shutdown one of
the secondary servers of a magma node, a few seconds later the method call
returns and the server is down. No timeout noticed.
The answer to the other question I've asked:
Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if
it is already a warmup for that primary location. Is it normal that he tries
to do that again? Furthermore: if there is more than 3 nodes (say for
instance 10 or more) each of them is again beWarmBackupFor the primary.
is still not clear to me. Is there a specific reason that this node2 again
tries to beWarmupBackupFor: aPrimaryLocation even if it is already one? I've
noticed that it takes quite a lot of cpu time to establish that connection.
And the clients connected to these node2: are they still being served in the
meanwhile? There must be a reason for this that I'm missing.
Trying to break it in an other way, I still found another (possible) issue.
I can still trigger a timeout during shutdown in the following way: if i for
instance have a primary and 3 secondary servers; and i shutdown immediately
after each other secondary 2 and 3. Then a async request from the primary to
all the secondary servers is issued to do
MagmaEnsureCorrectNodeConfiguration. So secondary 2 and 3 are at the same
time shutting down & receiving a warmup request. One of the secondary then
times out on the synchronous 'MaRemoveSecondaryLocationRequest' to the
primary. Probably because secondary 2 is doing beWarmupBackupFor:
aPrimaryLocation (issued from secondary 3) and then sending
a 'MaRemoveSecondaryLocationRequest', and there is some lock on that; so in
effect secondary2 is waiting on its own ? This might be related to my lack
of understanding the above question.
Thanks again for any help.
On Mon, Nov 23, 2009 at 10:04 PM, Chris Muller <asqueaker at gmail.com> wrote:
> Thanks for the great note Bart. An impressive analysis, it appears
> you have indeed uncovered a bug. I do have a fix, but first, please
> let me clarify the term "Node" as it relates to Magma. A MagmaNode
> represents a collection of servers all supporting _one_ repository.
> Each server maintains its own copy of that one repository. Their goal
> of a "Node" is to provide connecting MagmaSessions the illusion of one
> single repository that never goes down. Each member of the Node is
> simply referred to as a "server", either "the primary" or "a
> Incidentally, multiple Nodes are introduced by applications
> specifically written to connect objects _between_ repositories via
> MagmaForwardingProxy's. It's an advanced feature permitting Magma
> applications to scale along an additional dimension than that provided
> by multi-server MagmaNodes, by the applications creating "bookmarks"
> to objects in other physical repositories, they can be handled by
> separate cpus.. But that is a separate subject and something I doubt
> you are yet using.
> So, your assessment of the problem is spot-on. However, the correct
> solution is to implement the missing method:
> ^ false
> The group of servers that make up a MagmaNode communicate with each
> other for administrative tasks via a client/server model just like
> those used between a MagmaSession and a Magma server. In this c/s
> model, the primary is the "server," and the secondary's are the
> "clients". Secondary's may make synchronous requests to the primary
> (e.g., wait for a response), but the primary must only send async
> requests to the secondary's, otherwise a dead-lock could potentially
> The "Ma client server" framework allows any request to be processed
> asynchronously by answering false to #wantsResponse.
> Ok, I have posted new packages to MagmaTester with the above-mentioned
> fix. Please load the (3) updated packages and let me know if you have
> further problems. I think I smell an r44 around the corner..
> - Chris
> On Sun, Nov 22, 2009 at 7:55 AM, Bart Gauquie <bart.gauquie at gmail.com>
> > Dear all,
> > I'm using Pharo1.0rc1 Latest update: #10493, with Magma r43final.
> > I've been experimenting with Magma High availability. Its working for me
> > except for shutting down a node always throws a timeout exception.
> > If i have 1 root server & 1 node , everything works.
> > If i have 1 root server & 2 attached nodes, and shutdown one of them a
> > timeout is thrown.
> > I've been looking into it and i have some questions about how things work
> > magma.
> > Let me explain the flow I've seen and where if fails.
> > I have a node with following configuration: 'a MagmaNode
> > magma at craptop:51001, magma at craptop:51003, magma at craptop:51004' ;
> > in which
> > magma at craptop:51001 is the primary,
> > magma at craptop:51003 is Node 2,
> > magma at craptop:51004 is Node 3
> > If i shutdown Node 3 by calling shutdown on the serverconsole a
> > 'MaRemoveSecondaryLocationRequest' is sent to the primary. On the primary
> > MagmaNodeUpdate is initialized with as remove field the Node 3. This is
> > applied to the Magma node of the primary, and committed to each Node also
> > (MagmaNodeUpdate processUsing: aMagmaServerConsole). I can check this
> > because on primary, Node 2 and Node3 a new commitxxx.log appears with a
> > timestamp.
> > Then MagmaServerConsole>>ensureCorrectNodeConfiguration is executed on
> > primary. Since it is the primary it also executes:
> > 'self sessionsForOtherLocationsDo: [ : each | each
> > ensureCorrectNodeConfiguration ] ', which happens only on the Node 2
> (Node 3
> > was successfully removed from the Magma Node).
> > If i then debug in the Node 2, it again executes
> > MagmaServerConsole>>ensureCorrectNodeConfiguration, but since this is not
> > primary, it executes:
> > beWarmBackupFor: primaryLocation . This sets up a adminsession to the
> > primary and registers itself as a warm backup for. However this takes a
> > of time, and in the meantime, Node 3, which was still waiting on a reply
> > the original 'MaRemoveSecondaryLocationRequest' request, timeouts.
> > Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if
> > is already a warmup for that primary location. Is it normal that he tries
> > do that again? Furthermore: if there is more than 3 nodes (say for
> > 10 or more) each of them is again beWarmBackupFor the primary.
> > The way i fixed it is:
> > i added following:
> > MagmaServerConsole>>isWarmBackupFor: primaryLocation
> > ^primaryLocation = self node primaryLocation
> > which returns if this serverconsole already is a warmbackup for some
> > location.
> > And added following:
> > MagmaServerConsole>>beWarmBackupFor: primaryLocation
> > (self isWarmBackupFor: primaryLocation)
> > ifTrue: [^nil].
> > which is a guard clause which checks if the node is already a warmbackup
> > the given primarylocation, if so, just bail out early and do nothing.
> > With this fix, the shutdown of a Node3 works.
> > Is this a known issue? Is my solution correct? I do not know enough about
> > the internals of Magma to correctly judge about it.
> > Thanks in advance for any help.
> > I've attached a change set for both changes methods. Did not write any
> > for it :-(, and did not run other tests of magma.
> > Kind regards,
> > Bart
> > --
> > imagination is more important than knowledge - Albert Einstein
> > Logic will get you from A to B. Imagination will take you everywhere -
> > Albert Einstein
> > Learn from yesterday, live for today, hope for tomorrow. The important
> > is not to stop questioning. - Albert Einstein
> > The true sign of intelligence is not knowledge but imagination. - Albert
> > Einstein
> > Gravitation is not responsible for people falling in love. - Albert
> > _______________________________________________
> > Magma mailing list
> > Magma at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/mailman/listinfo/magma
imagination is more important than knowledge - Albert Einstein
Logic will get you from A to B. Imagination will take you everywhere -
Learn from yesterday, live for today, hope for tomorrow. The important thing
is not to stop questioning. - Albert Einstein
The true sign of intelligence is not knowledge but imagination. - Albert
Gravitation is not responsible for people falling in love. - Albert Einstein
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Magma