Dear all,
I'm using Pharo1.0rc1 Latest update: #10493, with Magma r43final.
I've been experimenting with Magma High availability. Its working for me
except for shutting down a node always throws a timeout exception.
If i have 1 root server & 1 node , everything works.
If i have 1 root server & 2 attached nodes, and shutdown one of them a
timeout is thrown.
I've been looking into it and i have some questions about how things work in
magma.
Let me explain the flow I've seen and where if fails.
I have a node with following configuration: 'a MagmaNode magma@craptop:51001,
magma@craptop:51003, magma@craptop:51004' ;
in which
- magma@craptop:51001 is the primary,
- magma@craptop:51003 is Node 2,
- magma@craptop:51004 is Node 3
If i shutdown Node 3 by calling shutdown on the serverconsole a
'MaRemoveSecondaryLocationRequest' is sent to the primary. On the primary a
MagmaNodeUpdate is initialized with as remove field the Node 3. This is
applied to the Magma node of the primary, and committed to each Node also
(MagmaNodeUpdate processUsing: aMagmaServerConsole). I can check this
because on primary, Node 2 and Node3 a new commitxxx.log appears with a new
timestamp.
Then MagmaServerConsole>>ensureCorrectNodeConfiguration is executed on the
primary. Since it is the primary it also executes:
'self sessionsForOtherLocationsDo: [ : each | each
ensureCorrectNodeConfiguration ] ', which happens only on the Node 2 (Node 3
was successfully removed from the Magma Node).
If i then debug in the Node 2, it again executes
MagmaServerConsole>>ensureCorrectNodeConfiguration, but since this is not a
primary, it executes:
beWarmBackupFor: primaryLocation . This sets up a adminsession to the
primary and registers itself as a warm backup for. However this takes a lot
of time, and in the meantime, Node 3, which was still waiting on a reply for
the original 'MaRemoveSecondaryLocationRequest' request, timeouts.
Furthermore: why has Node2 have to beWarmupBackupFor: aPrimaryLocation if it
is already a warmup for that primary location. Is it normal that he tries to
do that again? Furthermore: if there is more than 3 nodes (say for instance
10 or more) each of them is again beWarmBackupFor the primary.
The way i fixed it is:
i added following:
MagmaServerConsole>>isWarmBackupFor: primaryLocation
^primaryLocation = self node primaryLocation
which returns if this serverconsole already is a warmbackup for some primary
location.
And added following:
MagmaServerConsole>>beWarmBackupFor: primaryLocation
(self isWarmBackupFor: primaryLocation)
ifTrue: [^nil].
which is a guard clause which checks if the node is already a warmbackup for
the given primarylocation, if so, just bail out early and do nothing.
With this fix, the shutdown of a Node3 works.
Is this a known issue? Is my solution correct? I do not know enough about
the internals of Magma to correctly judge about it.
Thanks in advance for any help.
I've attached a change set for both changes methods. Did not write any test
for it :-(, and did not run other tests of magma.
Kind regards,
Bart
--
imagination is more important than knowledge - Albert Einstein
Logic will get you from A to B. Imagination will take you everywhere -
Albert Einstein
Learn from yesterday, live for today, hope for tomorrow. The important thing
is not to stop questioning. - Albert Einstein
The true sign of intelligence is not knowledge but imagination. - Albert
Einstein
Gravitation is not responsible for people falling in love. - Albert Einstein