tx-logging vs. redundancy for databases

Martin Drautzburg martin.drautzburg at web.de
Thu May 13 20:18:04 UTC 2004


Chris Muller <afunkyobject at yahoo.com> writes:

> I'm finally starting to think about building some fault tolerance into the
> Magma server.  My understanding of the traditional approach is to perform
> "transaction logging" to a log file that can, in the event of a power failure
> in mid-commit, be used as input to a "recovery utility" to allow proper
> restoration of that transaction and overwrite any potential corruption in the
> main db file.

I am happy to see you are thinking about these things.

It may help to see the transaction log as the "real thing" and the
database file only as a redundant cache to speed things up. I actually
have little hope that you can achieve instance recovery (the memory
content was lost) or database recovery (a database file was lost)
without such a logging mechanism.

> But why slow down every commit with a write to a log file if that *only* buys
> me a guarantee against corruption of the main db file in the event of a
> power-failure?  Instead, what if I "log" the commit records directly to another
> Magma database (on a secondary computer), thus keeping an idential mirror of
> the main database.  

How would that help you? Your backup system will be just as
inconsistent as the database file. Note that the primary purpose of a
transaction log is not to "save the data", but to be able to bring the
database back into a consistent state.

IIRC Oracle does it more or less the following way:

Changes to the database are first logged in a REDO LOG BUFFER. Then
the change is applied to the buffer cache (in-memory).  

Both the redo log buffer and the buffer cache are occasionally synced
to disk.

When data is read it is always read from the buffer cache.

When the memory content was lost, Oracle replays the redo log entires
that have a larger system change number (SCN) than the database
files. This is done in two phases: first ALL changes are applied (roll
forward) and then transactions that lack a commit are undone (roll
back) in the usual way using rollback segments. Note that the database
files also contain the "rollback segments" which hold "before-images"
of rows for MVCC.

This will NOT restore the original data 100% because some of the
conents of the redo log buffer may not have been written to disk. In
plus all uncommited transactions are lost.

I believe things are easier in the relational world, because you only
have to deal with a few simple structures, such as "blocks" and
"rows".



More information about the Squeak-dev mailing list