relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Wed Jan 3 11:23:58 UTC 2007

I agree that RDBMSs tend to be knee-jerk reactions that produce as
many problems as they solve.  My favorite alternative is not a real
OODBMS, but instead a pattern that is best exemplified by Prevayler, a
Java framework.  The main idea is to represent your data as objects,
and to ensure that every change to the data is represented by a
Command.  Executing a Command will cause it to write itself out on a
log.  You get persistence by periodically (once a day, perhaps)
writing all your objects out to disk and recovering from crashes by
restarting from the last checkpoint and then replaying the log of
Commands.  You get multiuser access by implementing the transactions
inside the system, making them fast (no disk access) and just having a
single lock for the whole system.

There are lots of things this doesn't give you.  You don't get a query
language.  This is a big deal in Java, not so big a deal in Smalltalk,
because Smalltalk makes a pretty good ad-hoc query language (for
Smalltalk programmers).  You don't get multilanguage access.  The data
must all fit in memory, or suddenly your assumptions of instantanious
transactions break down.  You have to be a decent programmer, though
it really isn't very hard, and if you let your  just barely decent
programmers build a toy system to learn the pattern then they should
do fine.  Lots of people learn the pattern by working on a production
system, but that is probably a bad idea for all patterns, not just
this one.

I did this in Smalltalk long before Prevayler was invented.  In fact,
Smalltalk-80 has always used this pattern.  Smalltalk programs are
stored this way.  Smalltalk programs are classes and methods, not the
ASCII stored on disk.  The ASCII stored on disk is several things,
including a printable representation with things like comments that
programmer need but the computer doesn't.  But the changes file, in
particular, is a log and when your image crashes, you often will
replay the log to get back to the version of the image at the time
your system crashed.  The real data is in the image, the log is just
to make sure your changes are persistent.

But this message stream is about what RDBMSs are good for, and I'd
like to address that.  First, even though SQL is rightly criticised,
it is a standard query language that enables people who are not
intimately familiar with the data to access it to make reports, browse
the data, or write simple applications.  Most groups I've seen have
only programmers using SQL and so don't take advantage of this, but
I've seen shops where secretaries used  SQL or query-by-exmple tools
to make reports for their bosses, so it can be done.  I suppose an OO
database or a Prevayler-like system could provide a query-by-example
tool, too, but I have never seen one.

Second, even though the use of an RDBMS  as the glue for a system is
rightly criticised, this is common practice.  It tends to produce a
big ball of mud, but for many organizations, this seems to be the best
they can do.  See http://www.laputan.org/mud/  One advantage of using
the RDBMS as the glue is that it is supported by nearly every language
and programming environment.  I think that the growing use of SOA will
make this less important, because people will use XML and web services
as the glue rather than a database.

Third, data in an RDBMS is a lot like plain text.  It is more or less
human readable.  It stands up to abuse pretty well, tolerating null
fields, non-normalized data, and use of special characters to store
several values in one field.  For the past few years, I have had
undergraduate teams migrating databases for a city government.  The
students are always amazed at how bad the data is.  I laugh at them.
All databases contain bad data, and it is important for the system to
tolerate it.

An RDBMS works best with relatively simple data models.  One of its
weaknesses is trees, since you have to make a separate query for each
descent.  It also has problems with versioned data, i.e. data with a
date or date range as part of the key.  But it can deal pretty well
with the usual set of objects that represent the state of the
business, and another set of objects that represent important events.
For example, a bank has deposit accounts and loans to customers, and
it records deposits, cash withdrawals, computation of interest,
payments, and checks written to other organizations.  Huge amounts of
data are OK for a RDBMS, but complex data models tend to cause
troubles.

It is wrong to think that persistence = RDBMS.  Architects should also
consider XML, a Prevayler-like system, binary files, OODBMS.  Each has
advantages and disadvantages.  An architect needs to have experience
with all these technologies to make a good decision.  Of course, which
one is best often depends on what is going to happen ten years in the
future, which is impossible to predict.  It is good to encapsulate
this decision so that it can be changed.  This is another advantage of
a SOA; your clients don't care how you store the data.

In the end, technology decisions on large projects depend as much on
politics as on technical reasons.  RDBMSs are the standard, the safe
course.  "Nobody ever got fired for buying Oracle".  They are usually
not chosen for technical reasons.  There are times when they really
are the best technical choice, but they are used a lot more often that
that.

-Ralph