relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Tue Jan 2 23:21:07 UTC 2007

Peter Crowther wrote:
>> From: J J
>>> From: Howard Stearns <hstearns at wisc.edu>
>>> That's something I've never really understood: what is the domain in
>>> which Relational Databases excel?
>> Handling large amounts of enterprise data.
> 
> Handling and dynamically querying large amounts of data where the data
> format is not necessarily completely stable and ad-hoc query performance
> is important.  "Large" here is "much larger than main memory of the
> machine(s) concerned".  I routinely handle data sets of tens of gigs on
> current commodity hardware - storing the data in RAM would be somewhat
> faster, but too expensive for the available capital.
> 
> The strength of relational over other forms is in being able to form
> arbitrary joins *relatively* efficiently, and hence in being able to
> query across data many times larger than main memory without excessive
> disk traffic.
> 
> Google isn't a good counter-example, as the ad-hoc querying is missing.
> The types of queries done on the Google database are very limited and
> are well known in advance.

My apologies for an ignorant and naive reply. So forgive if I am way off 
base.

But it seems to me that being able to perform arbitrary joins relatively 
efficiently is a requisite of an RDBMS because an RDBMS requires you to 
arbitrarily partition your data in such a way as to require such joins.

Any time I've spent reading a book on SQL and speaking of "normalizing" 
my data, I've never liked what I read.

1st Normal Form contains:
Atomicity: Each attribute must contain a single value, not a set of values.

Since a list is a natural and common way of grouping things. It is by 
nature (IMO) an unnatural thing to decompose the list so that I have the 
express ability to recompose the list.

Things like that I don't believe are common to other methods of 
persistence. I may be wrong.

So I don't believe that comparing a requisite of an RDBMS efficient 
joins to other persistent methods OODBS, filesystem, etc. which don't 
require such joins to be a valid comparison or at least one in which the 
RDBMS wins.

Of course this is a simple argument and could be debated and go off into 
the reasons of RDB theory. But there's not enough room for that.

I also don't understand what queries could be performed with an RDBMS 
that can't with Google. Or that couldn't if Google partitioned its data 
for such queries. After all the data set would have to be partitioned 
correctly for an RDB to perform said queries also.

Personally I've almost always been pleased with the performance of my 
Google queries. I've also been to many, many, many sites backed by an 
RDB in which the queries were horribly slow.

So I personally would be reticent to say Google made a wrong decision.
(and yes, I know you didn't say so either. :)

Jimmie