From: Howard Stearns I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse).
If you could go from a clean slate for each unique problem, probably none. Same for almost any other widely-deployed technology - almost by definition, if it has been deployed outside its niche then it has been deployed in sub-optimal ways.
I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always
done
things this way" or "we like this vendor") perspective.
Ah. Theory :-). In theory, I agree with you. In reality, I agree with Andreas - RDBMSs are stable and widely understood, and they aren't *that* bad for quite a wide class of problems.
[Naively, it seems like the obvious solution for this (mathematically) is a hashing operation to keep the data evenly distributed over in-memory systems on a LAN, plus an in-memory cache of recently used chunks. But let's assume I'm missing something. The task here is to figure out what I'm not seeing.]
Stability and incremental development. How long would it take to develop your system and get the showstopper defect rate down low enough for the system to be in line-of-business use? How would you extend your system when the next application area came along? How would you convince your funder (who wants some part of this system live *now*) to wait long enough to get the defects out?
Maybe this isn't typical
Alarmingly, it's not atypical. My day job involves a *lot* of plumbing - connecting up previously-incompatible data sources. This is because most organisations grow organically, and their IT systems grow organically with them. The systems are patch upon patch, and it's never possible to rip them out and start again.
Anyway, either the data AS USED fits into memory or doesn't.
I think that's naive. Could I instead propose "the data AS USED fits into memory plus what can reasonably be transferred via the mass storage subsystem"? For many of the apps I use, 98+% of the data accessed comes from RAM - but it's nice for the remaining 2% to be able to be 10x or 100x the size of RAM without major ill effects. However, are you looking at the correct boundary? Consider tape vs disk, L2 cache versus main memory, registers and L1 cache versus L2, etc. I would presume you could get even faster performance reading all this data into a mass of Athlon or Core L2 caches and using the HyperTransports to hook 'em together - why should we use this slow RAM stuff when we have this much faster on-chip capability? In other words, what's your rationale for picking RAM and disk as the boundary?
Is this still the fastest way? (Answer is no.)
No. Neither's your proposed approach of using main memory, I suspect. It may, however be the fastest per dollar of expenditure on the end system.
Is there some circumstance in which it is the fastest? Or the safest? Or allow us to do something that we could not do otherwise?
The latter, yes: develop a sufficiently robust and functional application in a sufficiently short time with a sufficiently cheap set of developers.
Having tools to allow a cult of specialists to break your own
computing
model (the relational calculus) is not feature, but a signal that something is wrong.
Agree entirely :-).
Maybe if we define the problem as "and you only have one commodity box to do it on." That's fair. Maybe that's it? (Then we need to find an "enterprise" with
only
one box...)
Or /n/ commodity boxes, where n is the capital the organisation can reasonably deploy in that area. I suspect you're coming from a background of solving "hard" problems, where throwing tin at the job is acceptable, to a world where return on investment determines whether a project can be justified or not. If it's not justifiable, it shouldn't get done - and there are plenty of quotes we've put in where we've been the cheapest, but the company's decided not to proceed because, actually, the cost of the system is more than they would ever save from using it. That's a pretty sharp razor for business applications, but ultimately it's the appropriate one to use - it avoids wasting capital and human effort to produce a shining solution when, ultimately, it would have been cheaper to use lots of monkeys with typewriters.
- Peter