tx-logging vs. redundancy for databases

Thu May 13 18:50:06 UTC 2004

On May 13, 2004, at 1:06 PM, Chris Muller wrote:

> I'm finally starting to think about building some fault tolerance into 
> the
> Magma server.  My understanding of the traditional approach is to 
> perform
> "transaction logging" to a log file that can, in the event of a power 
> failure
> in mid-commit, be used as input to a "recovery utility" to allow proper
> restoration of that transaction and overwrite any potential corruption 
> in the
> main db file.

IIRC, part of this strategy is to write to the journal before 
committing to the database. The journal entry would be formated such 
that there's a special marker for "end of entry". Once that is written, 
the entry is complete, and will be restored by the recovery utility. If 
the failure happens while the entry is being written, the marker won't 
be present, and the recover utility can ignore the entry - the power 
failure is considered to happen before the transaction.

Depending on your implementation, you might have several journal 
entries batched up and commit to the main database asynchronously. This 
could help with performance, and isn't a consistency issue so long as 
the clients view of the database takes into account the batched 
transactions.

> But why slow down every commit with a write to a log file if that 
> *only* buys
> me a guarantee against corruption of the main db file in the event of a
> power-failure?  Instead, what if I "log" the commit records directly 
> to another
> Magma database (on a secondary computer), thus keeping an idential 
> mirror of
> the main database.  In the event of a failure of the primary computer, 
> clients
> could just reconnect to the the mirrored database on the secondary 
> computer.
>
> So I get redundancy and "backup" for essentially the same cost.

I think you're conflating two separate issues here. One is data 
integrity. In the event of a hardware failure we want our data to be 
both available (we can recover from a corrupt db file) and consistent 
(all transactions are atomic). The other issue is data availability. If 
our database server goes down, we can switch to a backup server without 
interrupting service to our users.

These two aspects of fault tolerance should be independent. Adding a 
chain of n backup servers doesn't provide an prevent of data 
corruption, it just makes it less likely - we now need n hardware 
failures before our data is hosed, not just one. At the same time, 
linking the two safety mechanisms means you can't have both integrity 
and availability of your data. If the main server fails, you have two 
options. If you switch to the backup server, you maintain the 
availability of the data, but you risk corruption if there's a second 
failure. If you go offline and restore from the backup, which ensures 
integrity but interrupts availability.

There's one final consideration, I think, which is scalability. It's 
important to be able to scale up, but also to be able to scale down. 
I'd like to see Magma support a full range of usage configurations:

- one client session and one server, both in the same image.
- several client sessions and one server, both all in the same image.
- several client sessions and several servers, all in the same image.
- several clients and one server, all on different machines.
- several clients and several servers all on different machines.

I think it's too much to expect that all users of Magma will be able to 
run two separate database servers, so it's important to be able to 
handle the simple (and probably common) case of single-machine 
installations.

Hope these thoughts are helpful too you, and congratulations on your 
work so far. OODB support has been one of the big holes in Squeak's 
capabilities - I hope to see Magma fill it.

Colin