relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Tue Jan 2 22:57:24 UTC 2007

> From: Howard Stearns
> I'm asking what kinds of problems RDBMS are
> uniquely best at solving (or at least no worse).

If you could go from a clean slate for each unique problem, probably
none.  Same for almost any other widely-deployed technology - almost by
definition, if it has been deployed outside its niche then it has been
deployed in sub-optimal ways.

> I'm not asking whether
> they CAN be used for this problem or that.  I'm asking this from an
> engineering/mathematics perspective, not a business ("we've always
done
> things this way" or "we like this vendor") perspective.

Ah.  Theory :-).  In theory, I agree with you.  In reality, I agree with
Andreas - RDBMSs are stable and widely understood, and they aren't
*that* bad for quite a wide class of problems.

> [Naively, it seems like the obvious solution for this
> (mathematically)
> is a hashing operation to keep the data evenly distributed over
> in-memory systems on a LAN, plus an in-memory cache of recently used
> chunks. But let's assume I'm missing something. The task here is to
> figure out what I'm not seeing.]

Stability and incremental development.  How long would it take to
develop your system and get the showstopper defect rate down low enough
for the system to be in line-of-business use?  How would you extend your
system when the next application area came along?  How would you
convince your funder (who wants some part of this system live *now*) to
wait long enough to get the defects out?

> Maybe this isn't typical

Alarmingly, it's not atypical.  My day job involves a *lot* of plumbing
- connecting up previously-incompatible data sources.  This is because
most organisations grow organically, and their IT systems grow
organically with them.  The systems are patch upon patch, and it's never
possible to rip them out and start again.

> Anyway, either the data AS USED fits into memory or doesn't.

I think that's naive.  Could I instead propose "the data AS USED fits
into memory plus what can reasonably be transferred via the mass storage
subsystem"?  For many of the apps I use, 98+% of the data accessed comes
from RAM - but it's nice for the remaining 2% to be able to be 10x or
100x the size of RAM without major ill effects.  However, are you
looking at the correct boundary?  Consider tape vs disk, L2 cache versus
main memory, registers and L1 cache versus L2, etc.  I would presume you
could get even faster performance reading all this data into a mass of
Athlon or Core L2 caches and using the HyperTransports to hook 'em
together - why should we use this slow RAM stuff when we have this much
faster on-chip capability?  In other words, what's your rationale for
picking RAM and disk as the boundary?

> Is this still the fastest way? (Answer is no.)

No.  Neither's your proposed approach of using main memory, I suspect.
It may, however be the fastest per dollar of expenditure on the end
system.

> Is there some
> circumstance in which it is the fastest? Or the safest? Or allow us to
> do something that we could not do otherwise?

The latter, yes: develop a sufficiently robust and functional
application in a sufficiently short time with a sufficiently cheap set
of developers.

> Having tools to allow a cult of specialists to break your own
computing
> model (the relational calculus) is not feature, but a signal that
> something is wrong.

Agree entirely :-).

> Maybe if we define the
> problem as "and you only have one commodity box to do it on." That's
> fair. Maybe that's it?  (Then we need to find an "enterprise" with
only
> one box...)

Or /n/ commodity boxes, where n is the capital the organisation can
reasonably deploy in that area.  I suspect you're coming from a
background of solving "hard" problems, where throwing tin at the job is
acceptable, to a world where return on investment determines whether a
project can be justified or not.  If it's not justifiable, it shouldn't
get done - and there are plenty of quotes we've put in where we've been
the cheapest, but the company's decided not to proceed because,
actually, the cost of the system is more than they would ever save from
using it.  That's a pretty sharp razor for business applications, but
ultimately it's the appropriate one to use - it avoids wasting capital
and human effort to produce a shining solution when, ultimately, it
would have been cheaper to use lots of monkeys with typewriters.

		- Peter