(I see that the conversation has moved along since the time that I started to draft this, but here goes...)
On Jan 2, 2007, at 11:10 AM, J J wrote:
Sanity check: google is trying to keep a current snapshot of all websites and run it on commodity hardware. You could do exactly the same thing with a lot less CPU's using a highly tuned, distributed RDBMS. They chose to hand tune code instead of an RDBMS.
What, really? There are many possible reasons that Google don't use an RDBMS to index the web: stupidity, arrogance, excessive cost of an RDBMS, sound engineering decisions, or a combination of these.
According to the computer systems research community, Google has sound engineering reasons for its architecture; they have published papers at top conferences such as OSDI and SOSP. See http:// labs.google.com/papers ("The Google File System" and "BigTable..." might be the most relevant to this conversation).
That's not rule out the possibility of stupidity, arrogance, excessive cost, etc.. But it does cast doubt on the unsubstantiated claim that Google could "do exactly the same thing with a lot less CPUs".
Finally, in world with great distributed computing power, is centralized transaction processing really a superior model?
Some people seem to think so: http://lambda-the-ultimate.org/node/463
And there is more then that. I believe in that paper (dont have time to verify) they mention that hardware manufacturers are also starting to take this approach as well because fine grain locking is so bad.
As you mentioned in a follow-up email, this wasn't the paper you meant. Although it has nothing whatsoever to do with RDBMSes, I would recommend anyone who has enough free time to learn enough Haskell to read that paper.
Did you happen to find the intended link?
- Working with other applications that are designed to use RDB's?
Maybe, but that's a tautology, no?
Again, one has to work in a large company to appreciate the nature of enterprise application development.
I have no doubt that you're right, but it doesn't answer the question: what is it that RDBs *fundamentally* get correct? It's quite like the easy but unsatisfying answer to "why is Smalltalk so great?"... "well, you can't appreciate it unless you've grokked Smalltalk".
Certainly RDBs are essential to the operations of the modern enterprise, but how much of this is because RDBs are really the best imaginable approach to this sort of thing, and how much is due to a complicated process of co-evolution that has resulted in the current enterprise software ecosystem?
Josh