<div>Hi Chris,<br></div><div><br></div><div>Thanks for your quick reply.</div><div><br></div><div>I've tested the packages and now it works as expected. If shutdown one of the secondary servers of a magma node, a few seconds later the method call returns and the server is down. No timeout noticed. </div>
<div><br></div><div>The answer to the other question I've asked:</div><div><snippet></div><div> Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if it is already a warmup for that primary location. Is it normal that he tries to do that again? Furthermore: if there is more than 3 nodes (say for instance 10 or more) each of them is again beWarmBackupFor the primary. </div>
<div></snippet></div><div>is still not clear to me. Is there a specific reason that this node2 again tries to beWarmupBackupFor: aPrimaryLocation even if it is already one? I've noticed that it takes quite a lot of cpu time to establish that connection. And the clients connected to these node2: are they still being served in the meanwhile? There must be a reason for this that I'm missing.</div>
<div><br></div><div>Trying to break it in an other way, I still found another (possible) issue. I can still trigger a timeout during shutdown in the following way: if i for instance have a primary and 3 secondary servers; and i shutdown immediately after each other secondary 2 and 3. Then a async request from the primary to all the secondary servers is issued to do MagmaEnsureCorrectNodeConfiguration. So secondary 2 and 3 are at the same time shutting down & receiving a warmup request. One of the secondary then times out on the synchronous 'MaRemoveSecondaryLocationRequest' to the primary. Probably because secondary 2 is doing beWarmupBackupFor: aPrimaryLocation (issued from secondary 3) and then sending a 'MaRemoveSecondaryLocationRequest', and there is some lock on that; so in effect secondary2 is waiting on its own ? This might be related to my lack of understanding the above question.</div>
<div><br></div><div>Thanks again for any help.</div><div><br></div><div>Kind Regards,</div><div><br></div><div>Bart</div><br><div class="gmail_quote">On Mon, Nov 23, 2009 at 10:04 PM, Chris Muller <span dir="ltr"><<a href="mailto:asqueaker@gmail.com">asqueaker@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Thanks for the great note Bart. An impressive analysis, it appears<br>
you have indeed uncovered a bug. I do have a fix, but first, please<br>
let me clarify the term "Node" as it relates to Magma. A MagmaNode<br>
represents a collection of servers all supporting _one_ repository.<br>
Each server maintains its own copy of that one repository. Their goal<br>
of a "Node" is to provide connecting MagmaSessions the illusion of one<br>
single repository that never goes down. Each member of the Node is<br>
simply referred to as a "server", either "the primary" or "a<br>
secondary".<br>
<br>
Incidentally, multiple Nodes are introduced by applications<br>
specifically written to connect objects _between_ repositories via<br>
MagmaForwardingProxy's. It's an advanced feature permitting Magma<br>
applications to scale along an additional dimension than that provided<br>
by multi-server MagmaNodes, by the applications creating "bookmarks"<br>
to objects in other physical repositories, they can be handled by<br>
separate cpus.. But that is a separate subject and something I doubt<br>
you are yet using.<br>
<br>
So, your assessment of the problem is spot-on. However, the correct<br>
solution is to implement the missing method:<br>
<br>
MagmaEnsureCorrectNodeConfiguration>>#wantsReponse<br>
^ false<br>
<br>
The group of servers that make up a MagmaNode communicate with each<br>
other for administrative tasks via a client/server model just like<br>
those used between a MagmaSession and a Magma server. In this c/s<br>
model, the primary is the "server," and the secondary's are the<br>
"clients". Secondary's may make synchronous requests to the primary<br>
(e.g., wait for a response), but the primary must only send async<br>
requests to the secondary's, otherwise a dead-lock could potentially<br>
occur.<br>
<br>
The "Ma client server" framework allows any request to be processed<br>
asynchronously by answering false to #wantsResponse.<br>
<br>
=====<br>
<br>
Ok, I have posted new packages to MagmaTester with the above-mentioned<br>
fix. Please load the (3) updated packages and let me know if you have<br>
further problems. I think I smell an r44 around the corner..<br>
<br>
- Chris<br>
<div><div class="h5"><br>
<br>
On Sun, Nov 22, 2009 at 7:55 AM, Bart Gauquie <<a href="mailto:bart.gauquie@gmail.com">bart.gauquie@gmail.com</a>> wrote:<br>
> Dear all,<br>
><br>
> I'm using Pharo1.0rc1 Latest update: #10493, with Magma r43final.<br>
><br>
> I've been experimenting with Magma High availability. Its working for me<br>
> except for shutting down a node always throws a timeout exception.<br>
> If i have 1 root server & 1 node , everything works.<br>
> If i have 1 root server & 2 attached nodes, and shutdown one of them a<br>
> timeout is thrown.<br>
> I've been looking into it and i have some questions about how things work in<br>
> magma.<br>
> Let me explain the flow I've seen and where if fails.<br>
> I have a node with following configuration: 'a MagmaNode<br>
> magma@craptop:51001, magma@craptop:51003, magma@craptop:51004' ;<br>
> in which<br>
><br>
> magma@craptop:51001 is the primary,<br>
> magma@craptop:51003 is Node 2,<br>
> magma@craptop:51004 is Node 3<br>
><br>
> If i shutdown Node 3 by calling shutdown on the serverconsole a<br>
> 'MaRemoveSecondaryLocationRequest' is sent to the primary. On the primary a<br>
> MagmaNodeUpdate is initialized with as remove field the Node 3. This is<br>
> applied to the Magma node of the primary, and committed to each Node also<br>
> (MagmaNodeUpdate processUsing: aMagmaServerConsole). I can check this<br>
> because on primary, Node 2 and Node3 a new commitxxx.log appears with a new<br>
> timestamp.<br>
><br>
> Then MagmaServerConsole>>ensureCorrectNodeConfiguration is executed on the<br>
> primary. Since it is the primary it also executes:<br>
> 'self sessionsForOtherLocationsDo: [ : each | each<br>
> ensureCorrectNodeConfiguration ] ', which happens only on the Node 2 (Node 3<br>
> was successfully removed from the Magma Node).<br>
> If i then debug in the Node 2, it again executes<br>
> MagmaServerConsole>>ensureCorrectNodeConfiguration, but since this is not a<br>
> primary, it executes:<br>
> beWarmBackupFor: primaryLocation . This sets up a adminsession to the<br>
> primary and registers itself as a warm backup for. However this takes a lot<br>
> of time, and in the meantime, Node 3, which was still waiting on a reply for<br>
> the original 'MaRemoveSecondaryLocationRequest' request, timeouts.<br>
> Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if it<br>
> is already a warmup for that primary location. Is it normal that he tries to<br>
> do that again? Furthermore: if there is more than 3 nodes (say for instance<br>
> 10 or more) each of them is again beWarmBackupFor the primary.<br>
> The way i fixed it is:<br>
> i added following:<br>
> MagmaServerConsole>>isWarmBackupFor: primaryLocation<br>
> ^primaryLocation = self node primaryLocation<br>
><br>
> which returns if this serverconsole already is a warmbackup for some primary<br>
> location.<br>
> And added following:<br>
> MagmaServerConsole>>beWarmBackupFor: primaryLocation<br>
> (self isWarmBackupFor: primaryLocation)<br>
> ifTrue: [^nil].<br>
><br>
> which is a guard clause which checks if the node is already a warmbackup for<br>
> the given primarylocation, if so, just bail out early and do nothing.<br>
> With this fix, the shutdown of a Node3 works.<br>
> Is this a known issue? Is my solution correct? I do not know enough about<br>
> the internals of Magma to correctly judge about it.<br>
> Thanks in advance for any help.<br>
> I've attached a change set for both changes methods. Did not write any test<br>
> for it :-(, and did not run other tests of magma.<br>
> Kind regards,<br>
> Bart<br>
> --<br>
> imagination is more important than knowledge - Albert Einstein<br>
> Logic will get you from A to B. Imagination will take you everywhere -<br>
> Albert Einstein<br>
> Learn from yesterday, live for today, hope for tomorrow. The important thing<br>
> is not to stop questioning. - Albert Einstein<br>
> The true sign of intelligence is not knowledge but imagination. - Albert<br>
> Einstein<br>
> Gravitation is not responsible for people falling in love. - Albert Einstein<br>
><br>
</div></div>> _______________________________________________<br>
> Magma mailing list<br>
> <a href="mailto:Magma@lists.squeakfoundation.org">Magma@lists.squeakfoundation.org</a><br>
> <a href="http://lists.squeakfoundation.org/mailman/listinfo/magma" target="_blank">http://lists.squeakfoundation.org/mailman/listinfo/magma</a><br>
><br>
><br>
</blockquote></div><br><br clear="all"><br>-- <br>imagination is more important than knowledge - Albert Einstein<br>Logic will get you from A to B. Imagination will take you everywhere - Albert Einstein<br>Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning. - Albert Einstein<br>
The true sign of intelligence is not knowledge but imagination. - Albert Einstein<br>Gravitation is not responsible for people falling in love. - Albert Einstein<br>