Magma High Availability Shutdown a Node gets a timeout

Bart Gauquie bart.gauquie at gmail.com
Tue Nov 24 07:53:30 UTC 2009


Hi Chris,

Thanks for your quick reply.

I've tested the packages and now it works as expected. If shutdown one of
the secondary servers of a magma node, a few seconds later the method call
returns and the server is down. No timeout noticed.

The answer to the other question I've asked:
<snippet>
 Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if
it is already a warmup for that primary location. Is it normal that he tries
to do that again? Furthermore: if there is more than 3 nodes (say for
instance 10 or more) each of them is again beWarmBackupFor the primary.
</snippet>
is still not clear to me. Is there a specific reason that this node2 again
tries to beWarmupBackupFor: aPrimaryLocation even if it is already one? I've
noticed that it takes quite a lot of cpu time to establish that connection.
And the clients connected to these node2: are they still being served in the
meanwhile? There must be a reason for this that I'm missing.

Trying to break it in an other way, I still found another (possible) issue.
I can still trigger a timeout during shutdown in the following way: if i for
instance have a primary and 3 secondary servers; and i shutdown immediately
after each other secondary 2 and 3. Then a async request from the primary to
all the secondary servers is issued to do
MagmaEnsureCorrectNodeConfiguration. So secondary 2 and 3 are at the same
time shutting down & receiving a warmup request. One of the secondary then
times out on the synchronous 'MaRemoveSecondaryLocationRequest' to the
primary. Probably because secondary 2 is doing beWarmupBackupFor:
aPrimaryLocation (issued from secondary 3) and then sending
a 'MaRemoveSecondaryLocationRequest', and there is some lock on that; so in
effect secondary2 is waiting on its own ? This might be related to my lack
of understanding the above question.

Thanks again for any help.

Kind Regards,

Bart

On Mon, Nov 23, 2009 at 10:04 PM, Chris Muller <asqueaker at gmail.com> wrote:

> Thanks for the great note Bart.  An impressive analysis, it appears
> you have indeed uncovered a bug.  I do have a fix, but first, please
> let me clarify the term "Node" as it relates to Magma.  A MagmaNode
> represents a collection of servers all supporting _one_ repository.
> Each server maintains its own copy of that one repository.  Their goal
> of a "Node" is to provide connecting MagmaSessions the illusion of one
> single repository that never goes down.  Each member of the Node is
> simply referred to as a "server", either "the primary" or "a
> secondary".
>
> Incidentally, multiple Nodes are introduced by applications
> specifically written to connect objects _between_ repositories via
> MagmaForwardingProxy's.  It's an advanced feature permitting Magma
> applications to scale along an additional dimension than that provided
> by multi-server MagmaNodes, by the applications creating "bookmarks"
> to objects in other physical repositories, they can be handled by
> separate cpus..  But that is a separate subject and something I doubt
> you are yet using.
>
> So, your assessment of the problem is spot-on.  However, the correct
> solution is to implement the missing method:
>
>  MagmaEnsureCorrectNodeConfiguration>>#wantsReponse
>        ^ false
>
> The group of servers that make up a MagmaNode communicate with each
> other for administrative tasks via a client/server model just like
> those used between a MagmaSession and a Magma server.  In this c/s
> model, the primary is the "server," and the secondary's are the
> "clients".  Secondary's may make synchronous requests to the primary
> (e.g., wait for a response), but the primary must only send async
> requests to the secondary's, otherwise a dead-lock could potentially
> occur.
>
> The "Ma client server" framework allows any request to be processed
> asynchronously by answering false to #wantsResponse.
>
> =====
>
> Ok, I have posted new packages to MagmaTester with the above-mentioned
> fix.  Please load the (3) updated packages and let me know if you have
> further problems.  I think I smell an r44 around the corner..
>
>  - Chris
>
>
> On Sun, Nov 22, 2009 at 7:55 AM, Bart Gauquie <bart.gauquie at gmail.com>
> wrote:
> > Dear all,
> >
> > I'm using Pharo1.0rc1 Latest update: #10493, with Magma r43final.
> >
> > I've been experimenting with Magma High availability. Its working for me
> > except for shutting down a node always throws a timeout exception.
> > If i have 1 root server & 1 node , everything works.
> > If i have 1 root server & 2 attached nodes, and shutdown one of them a
> > timeout is thrown.
> > I've been looking into it and i have some questions about how things work
> in
> > magma.
> > Let me explain the flow I've seen and where if fails.
> > I have a node with following configuration: 'a MagmaNode
> > magma at craptop:51001, magma at craptop:51003, magma at craptop:51004' ;
> > in which
> >
> > magma at craptop:51001 is the primary,
> > magma at craptop:51003 is Node 2,
> > magma at craptop:51004 is Node 3
> >
> > If i shutdown Node 3 by calling shutdown on the serverconsole a
> > 'MaRemoveSecondaryLocationRequest' is sent to the primary. On the primary
> a
> > MagmaNodeUpdate is initialized with as remove field the Node 3. This is
> > applied to the Magma node of the primary, and committed to each Node also
> > (MagmaNodeUpdate processUsing: aMagmaServerConsole). I can check this
> > because on primary, Node 2 and Node3 a new commitxxx.log appears with a
> new
> > timestamp.
> >
> > Then MagmaServerConsole>>ensureCorrectNodeConfiguration is executed on
> the
> > primary.  Since it is the primary it also executes:
> > 'self sessionsForOtherLocationsDo: [ : each | each
> > ensureCorrectNodeConfiguration ] ', which happens only on the Node 2
> (Node 3
> > was successfully removed from the Magma Node).
> > If i then debug in the Node 2, it again executes
> > MagmaServerConsole>>ensureCorrectNodeConfiguration, but since this is not
> a
> > primary, it executes:
> > beWarmBackupFor: primaryLocation . This sets up a adminsession to the
> > primary and registers itself as a warm backup for. However this takes a
> lot
> > of time, and in the meantime, Node 3, which was still waiting on a reply
> for
> > the original 'MaRemoveSecondaryLocationRequest' request, timeouts.
> > Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if
> it
> > is already a warmup for that primary location. Is it normal that he tries
> to
> > do that again? Furthermore: if there is more than 3 nodes (say for
> instance
> > 10 or more) each of them is again beWarmBackupFor the primary.
> > The way i fixed it is:
> > i added following:
> > MagmaServerConsole>>isWarmBackupFor: primaryLocation
> > ^primaryLocation = self node primaryLocation
> >
> > which returns if this serverconsole already is a warmbackup for some
> primary
> > location.
> > And added following:
> > MagmaServerConsole>>beWarmBackupFor: primaryLocation
> >   (self isWarmBackupFor: primaryLocation)
> >     ifTrue: [^nil].
> >
> > which is a guard clause which checks if the node is already a warmbackup
> for
> > the given primarylocation, if so, just bail out early and do nothing.
> > With this fix, the shutdown of a Node3 works.
> > Is this a known issue? Is my solution correct? I do not know enough about
> > the internals of Magma to correctly judge about it.
> > Thanks in advance for any help.
> > I've attached a change set for both changes methods. Did not write any
> test
> > for it :-(, and did not run other tests of magma.
> > Kind regards,
> > Bart
> > --
> > imagination is more important than knowledge - Albert Einstein
> > Logic will get you from A to B. Imagination will take you everywhere -
> > Albert Einstein
> > Learn from yesterday, live for today, hope for tomorrow. The important
> thing
> > is not to stop questioning. - Albert Einstein
> > The true sign of intelligence is not knowledge but imagination. - Albert
> > Einstein
> > Gravitation is not responsible for people falling in love. - Albert
> Einstein
> >
> > _______________________________________________
> > Magma mailing list
> > Magma at lists.squeakfoundation.org
> > http://lists.squeakfoundation.org/mailman/listinfo/magma
> >
> >
>



-- 
imagination is more important than knowledge - Albert Einstein
Logic will get you from A to B. Imagination will take you everywhere -
Albert Einstein
Learn from yesterday, live for today, hope for tomorrow. The important thing
is not to stop questioning. - Albert Einstein
The true sign of intelligence is not knowledge but imagination. - Albert
Einstein
Gravitation is not responsible for people falling in love. - Albert Einstein
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/magma/attachments/20091124/43c2bd8d/attachment.htm


More information about the Magma mailing list