Magma High Availability Shutdown a Node gets a timeout

Bart Gauquie bart.gauquie at gmail.com
Sun Nov 22 15:19:21 UTC 2009


I did some more debugging and in the
MagmaServerConsole>>beWarmBackupFor: primaryLocation
message it is

[ | primarySession |

[
primarySession := primaryLocation newAdminSession.
primarySession connectAs: '_beWarmBackupFor'.

the connectAs: which takes 15 - 20 seconds to complete.


On Sun, Nov 22, 2009 at 3:15 PM, Bart Gauquie <bart.gauquie at gmail.com>wrote:

> Did some further testing, and with the patch i suggest, creating a node
> does not work at all anymore if you add a second one. I suppose my
> restriction is too strong.
>
>
> On Sun, Nov 22, 2009 at 2:55 PM, Bart Gauquie <bart.gauquie at gmail.com>wrote:
>
>> Dear all,
>>
>> I'm using Pharo1.0rc1 Latest update: #10493, with Magma r43final.
>>
>> I've been experimenting with Magma High availability. Its working for me
>> except for shutting down a node always throws a timeout exception.
>> If i have 1 root server & 1 node , everything works.
>> If i have 1 root server & 2 attached nodes, and shutdown one of them a
>> timeout is thrown.
>> I've been looking into it and i have some questions about how things work
>> in magma.
>> Let me explain the flow I've seen and where if fails.
>>
>> I have a node with following configuration: 'a MagmaNode magma at craptop:51001,
>> magma at craptop:51003, magma at craptop:51004' ;
>> in which
>>
>>    - magma at craptop:51001 is the primary,
>>    - magma at craptop:51003 is Node 2,
>>    - magma at craptop:51004 is Node 3
>>
>> If i shutdown Node 3 by calling shutdown on the serverconsole a
>> 'MaRemoveSecondaryLocationRequest' is sent to the primary. On the primary a
>> MagmaNodeUpdate is initialized with as remove field the Node 3. This is
>> applied to the Magma node of the primary, and committed to each Node also
>> (MagmaNodeUpdate processUsing: aMagmaServerConsole). I can check this
>> because on primary, Node 2 and Node3 a new commitxxx.log appears with a new
>> timestamp.
>> Then MagmaServerConsole>>ensureCorrectNodeConfiguration is executed on the
>> primary.  Since it is the primary it also executes:
>> 'self sessionsForOtherLocationsDo: [ : each | each
>> ensureCorrectNodeConfiguration ] ', which happens only on the Node 2 (Node 3
>> was successfully removed from the Magma Node).
>>
>> If i then debug in the Node 2, it again executes
>> MagmaServerConsole>>ensureCorrectNodeConfiguration, but since this is not a
>> primary, it executes:
>> beWarmBackupFor: primaryLocation . This sets up a adminsession to the
>> primary and registers itself as a warm backup for. However this takes a lot
>> of time, and in the meantime, Node 3, which was still waiting on a reply for
>> the original 'MaRemoveSecondaryLocationRequest' request, timeouts.
>> Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if it
>> is already a warmup for that primary location. Is it normal that he tries to
>> do that again? Furthermore: if there is more than 3 nodes (say for instance
>> 10 or more) each of them is again beWarmBackupFor the primary.
>>
>> The way i fixed it is:
>> i added following:
>>
>> MagmaServerConsole>>isWarmBackupFor: primaryLocation
>> ^primaryLocation = self node primaryLocation
>>
>> which returns if this serverconsole already is a warmbackup for some
>> primary location.
>>
>> And added following:
>> MagmaServerConsole>>beWarmBackupFor: primaryLocation
>>   (self isWarmBackupFor: primaryLocation)
>>     ifTrue: [^nil].
>>
>> which is a guard clause which checks if the node is already a warmbackup
>> for the given primarylocation, if so, just bail out early and do nothing.
>>
>> With this fix, the shutdown of a Node3 works.
>>
>> Is this a known issue? Is my solution correct? I do not know enough about
>> the internals of Magma to correctly judge about it.
>>
>> Thanks in advance for any help.
>>
>> I've attached a change set for both changes methods. Did not write any
>> test for it :-(, and did not run other tests of magma.
>>
>> Kind regards,
>>
>> Bart
>>
>> --
>> imagination is more important than knowledge - Albert Einstein
>> Logic will get you from A to B. Imagination will take you everywhere -
>> Albert Einstein
>> Learn from yesterday, live for today, hope for tomorrow. The important
>> thing is not to stop questioning. - Albert Einstein
>> The true sign of intelligence is not knowledge but imagination. - Albert
>> Einstein
>> Gravitation is not responsible for people falling in love. - Albert
>> Einstein
>>
>
>
>
> --
> imagination is more important than knowledge - Albert Einstein
> Logic will get you from A to B. Imagination will take you everywhere -
> Albert Einstein
> Learn from yesterday, live for today, hope for tomorrow. The important
> thing is not to stop questioning. - Albert Einstein
> The true sign of intelligence is not knowledge but imagination. - Albert
> Einstein
> Gravitation is not responsible for people falling in love. - Albert
> Einstein
>



-- 
imagination is more important than knowledge - Albert Einstein
Logic will get you from A to B. Imagination will take you everywhere -
Albert Einstein
Learn from yesterday, live for today, hope for tomorrow. The important thing
is not to stop questioning. - Albert Einstein
The true sign of intelligence is not knowledge but imagination. - Albert
Einstein
Gravitation is not responsible for people falling in love. - Albert Einstein
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/magma/attachments/20091122/7c6ae69c/attachment-0001.htm


More information about the Magma mailing list