<div>Hi Chris,<br></div><div><br></div><div>Thanks for your quick reply.</div><div><br></div><div>I&#39;ve tested the packages and now it works as expected. If shutdown one of the secondary servers of a magma node, a few seconds later the method call returns and the server is down. No timeout noticed. </div>

<div><br></div><div>The answer to the other question I&#39;ve asked:</div><div>&lt;snippet&gt;</div><div> Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if it is already a warmup for that primary location. Is it normal that he tries to do that again? Furthermore: if there is more than 3 nodes (say for instance 10 or more) each of them is again beWarmBackupFor the primary. </div>

<div>&lt;/snippet&gt;</div><div>is still not clear to me. Is there a specific reason that this node2 again tries to beWarmupBackupFor: aPrimaryLocation even if it is already one? I&#39;ve noticed that it takes quite a lot of cpu time to establish that connection. And the clients connected to these node2: are they still being served in the meanwhile? There must be a reason for this that I&#39;m missing.</div>

<div><br></div><div>Trying to break it in an other way, I still found another (possible) issue. I can still trigger a timeout during shutdown in the following way: if i for instance have a primary and 3 secondary servers; and i shutdown immediately after each other secondary 2 and 3. Then a async request from the primary to all the secondary servers is issued to do MagmaEnsureCorrectNodeConfiguration. So secondary 2 and 3 are at the same time shutting down &amp; receiving a warmup request. One of the secondary then times out on the synchronous &#39;MaRemoveSecondaryLocationRequest&#39; to the primary. Probably because secondary 2 is doing beWarmupBackupFor: aPrimaryLocation (issued from secondary 3) and then sending a &#39;MaRemoveSecondaryLocationRequest&#39;, and there is some lock on that; so in effect secondary2 is waiting on its own ? This might be related to my lack of understanding the above question.</div>

<div><br></div><div>Thanks again for any help.</div><div><br></div><div>Kind Regards,</div><div><br></div><div>Bart</div><br><div class="gmail_quote">On Mon, Nov 23, 2009 at 10:04 PM, Chris Muller <span dir="ltr">&lt;<a href="mailto:asqueaker@gmail.com">asqueaker@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Thanks for the great note Bart.  An impressive analysis, it appears<br>

you have indeed uncovered a bug.  I do have a fix, but first, please<br>

let me clarify the term &quot;Node&quot; as it relates to Magma.  A MagmaNode<br>

represents a collection of servers all supporting _one_ repository.<br>

Each server maintains its own copy of that one repository.  Their goal<br>

of a &quot;Node&quot; is to provide connecting MagmaSessions the illusion of one<br>

single repository that never goes down.  Each member of the Node is<br>

simply referred to as a &quot;server&quot;, either &quot;the primary&quot; or &quot;a<br>

secondary&quot;.<br>

<br>

Incidentally, multiple Nodes are introduced by applications<br>

specifically written to connect objects _between_ repositories via<br>

MagmaForwardingProxy&#39;s.  It&#39;s an advanced feature permitting Magma<br>

applications to scale along an additional dimension than that provided<br>

by multi-server MagmaNodes, by the applications creating &quot;bookmarks&quot;<br>

to objects in other physical repositories, they can be handled by<br>

separate cpus..  But that is a separate subject and something I doubt<br>

you are yet using.<br>

<br>

So, your assessment of the problem is spot-on.  However, the correct<br>

solution is to implement the missing method:<br>

<br>

  MagmaEnsureCorrectNodeConfiguration&gt;&gt;#wantsReponse<br>

        ^ false<br>

<br>

The group of servers that make up a MagmaNode communicate with each<br>

other for administrative tasks via a client/server model just like<br>

those used between a MagmaSession and a Magma server.  In this c/s<br>

model, the primary is the &quot;server,&quot; and the secondary&#39;s are the<br>

&quot;clients&quot;.  Secondary&#39;s may make synchronous requests to the primary<br>

(e.g., wait for a response), but the primary must only send async<br>

requests to the secondary&#39;s, otherwise a dead-lock could potentially<br>

occur.<br>

<br>

The &quot;Ma client server&quot; framework allows any request to be processed<br>

asynchronously by answering false to #wantsResponse.<br>

<br>

=====<br>

<br>

Ok, I have posted new packages to MagmaTester with the above-mentioned<br>

fix.  Please load the (3) updated packages and let me know if you have<br>

further problems.  I think I smell an r44 around the corner..<br>

<br>

 - Chris<br>

<div><div class="h5"><br>

<br>

On Sun, Nov 22, 2009 at 7:55 AM, Bart Gauquie &lt;<a href="mailto:bart.gauquie@gmail.com">bart.gauquie@gmail.com</a>&gt; wrote:<br>

&gt; Dear all,<br>

&gt;<br>

&gt; I&#39;m using Pharo1.0rc1 Latest update: #10493, with Magma r43final.<br>

&gt;<br>

&gt; I&#39;ve been experimenting with Magma High availability. Its working for me<br>

&gt; except for shutting down a node always throws a timeout exception.<br>

&gt; If i have 1 root server &amp; 1 node , everything works.<br>

&gt; If i have 1 root server &amp; 2 attached nodes, and shutdown one of them a<br>

&gt; timeout is thrown.<br>

&gt; I&#39;ve been looking into it and i have some questions about how things work in<br>

&gt; magma.<br>

&gt; Let me explain the flow I&#39;ve seen and where if fails.<br>

&gt; I have a node with following configuration: &#39;a MagmaNode<br>

&gt; magma@craptop:51001, magma@craptop:51003, magma@craptop:51004&#39; ;<br>

&gt; in which<br>

&gt;<br>

&gt; magma@craptop:51001 is the primary,<br>

&gt; magma@craptop:51003 is Node 2,<br>

&gt; magma@craptop:51004 is Node 3<br>

&gt;<br>

&gt; If i shutdown Node 3 by calling shutdown on the serverconsole a<br>

&gt; &#39;MaRemoveSecondaryLocationRequest&#39; is sent to the primary. On the primary a<br>

&gt; MagmaNodeUpdate is initialized with as remove field the Node 3. This is<br>

&gt; applied to the Magma node of the primary, and committed to each Node also<br>

&gt; (MagmaNodeUpdate processUsing: aMagmaServerConsole). I can check this<br>

&gt; because on primary, Node 2 and Node3 a new commitxxx.log appears with a new<br>

&gt; timestamp.<br>

&gt;<br>

&gt; Then MagmaServerConsole&gt;&gt;ensureCorrectNodeConfiguration is executed on the<br>

&gt; primary.  Since it is the primary it also executes:<br>

&gt; &#39;self sessionsForOtherLocationsDo: [ : each | each<br>

&gt; ensureCorrectNodeConfiguration ] &#39;, which happens only on the Node 2 (Node 3<br>

&gt; was successfully removed from the Magma Node).<br>

&gt; If i then debug in the Node 2, it again executes<br>

&gt; MagmaServerConsole&gt;&gt;ensureCorrectNodeConfiguration, but since this is not a<br>

&gt; primary, it executes:<br>

&gt; beWarmBackupFor: primaryLocation . This sets up a adminsession to the<br>

&gt; primary and registers itself as a warm backup for. However this takes a lot<br>

&gt; of time, and in the meantime, Node 3, which was still waiting on a reply for<br>

&gt; the original &#39;MaRemoveSecondaryLocationRequest&#39; request, timeouts.<br>

&gt; Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if it<br>

&gt; is already a warmup for that primary location. Is it normal that he tries to<br>

&gt; do that again? Furthermore: if there is more than 3 nodes (say for instance<br>

&gt; 10 or more) each of them is again beWarmBackupFor the primary.<br>

&gt; The way i fixed it is:<br>

&gt; i added following:<br>

&gt; MagmaServerConsole&gt;&gt;isWarmBackupFor: primaryLocation<br>

&gt; ^primaryLocation = self node primaryLocation<br>

&gt;<br>

&gt; which returns if this serverconsole already is a warmbackup for some primary<br>

&gt; location.<br>

&gt; And added following:<br>

&gt; MagmaServerConsole&gt;&gt;beWarmBackupFor: primaryLocation<br>

&gt;   (self isWarmBackupFor: primaryLocation)<br>

&gt;     ifTrue: [^nil].<br>

&gt;<br>

&gt; which is a guard clause which checks if the node is already a warmbackup for<br>

&gt; the given primarylocation, if so, just bail out early and do nothing.<br>

&gt; With this fix, the shutdown of a Node3 works.<br>

&gt; Is this a known issue? Is my solution correct? I do not know enough about<br>

&gt; the internals of Magma to correctly judge about it.<br>

&gt; Thanks in advance for any help.<br>

&gt; I&#39;ve attached a change set for both changes methods. Did not write any test<br>

&gt; for it :-(, and did not run other tests of magma.<br>

&gt; Kind regards,<br>

&gt; Bart<br>

&gt; --<br>

&gt; imagination is more important than knowledge - Albert Einstein<br>

&gt; Logic will get you from A to B. Imagination will take you everywhere -<br>

&gt; Albert Einstein<br>

&gt; Learn from yesterday, live for today, hope for tomorrow. The important thing<br>

&gt; is not to stop questioning. - Albert Einstein<br>

&gt; The true sign of intelligence is not knowledge but imagination. - Albert<br>

&gt; Einstein<br>

&gt; Gravitation is not responsible for people falling in love. - Albert Einstein<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; Magma mailing list<br>

&gt; <a href="mailto:Magma@lists.squeakfoundation.org">Magma@lists.squeakfoundation.org</a><br>

&gt; <a href="http://lists.squeakfoundation.org/mailman/listinfo/magma" target="_blank">http://lists.squeakfoundation.org/mailman/listinfo/magma</a><br>

&gt;<br>

&gt;<br>

</blockquote></div><br><br clear="all"><br>-- <br>imagination is more important than knowledge - Albert Einstein<br>Logic will get you from A to B. Imagination will take you everywhere - Albert Einstein<br>Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning. - Albert Einstein<br>

The true sign of intelligence is not knowledge but imagination. - Albert Einstein<br>Gravitation is not responsible for people falling in love. - Albert Einstein<br>