relational for what? [was: Design Principles Behind Smalltalk, Revisited]

Wed Jan 3 21:34:46 UTC 2007

>From: Howard Stearns <hstearns at wisc.edu>
>Reply-To: The general-purpose Squeak developers 
>list<squeak-dev at lists.squeakfoundation.org>
>To: The general-purpose Squeak developers 
>list<squeak-dev at lists.squeakfoundation.org>
>Subject: Re: relational for what? [was: Design Principles Behind Smalltalk, 
>  Revisited]
>Date: Tue, 02 Jan 2007 14:36:22 -0600
>
>Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are 
>uniquely best at solving (or at least no worse). I'm not asking whether 
>they CAN be used for this problem or that.  I'm asking this from an 
>engineering/mathematics perspective, not a business ("we've always done 
>things this way" or "we like this vendor") perspective.
>

<horror story ommited>

Honestly, it just seems to me like someone architected an awful system.  I 
know, for example, some databases (oracle I thought) can span a given DB 
across boxes etc. with different methods of partitioning (e.g. some tables 
here, some tables there, foreign keys between them, etc.).

You certainly shouldn't have to be copying data between tables.  If nothing 
better, you could install MySQL everywhere and turn on replication.

>Maybe this isn't typical, but it is the architecture that Oracle and its 
>PeopleSoft division pushes on us in their extensive training classes. And 
>it appears to be the architecture discussed in the higher education IT 
>conferences and Web sites in the U.S.

Well, the big companies tend to push the most expensive option, not the best 
for the data model.  In my experience so far, I can think of no case where 
we accepted what the vendors proposed before some serious threats etc..

>Anyway, either the data AS USED fits into memory or doesn't. If it does, 
>then what benefit is the relational math providing? If it doesn't, then we 
>have to ask whether the math techniques that were developed to provide 
>efficient random access over disks 20 years ago are still valid. Is this 
>still the fastest way? (Answer is no.) Is there some circumstance in which 
>it is the fastest? Or the safest? Or allow us to do something that we could 
>not do otherwise?

I still don't think the question has anything to do with "in memory" vs. 
"not in memory" or "quickest way to access the disk".  You can tune your 
RDBMS to try to cache as much as possible in memory, and then it becomes a 
contest of: is it faster for me to write all the code to do the joins, etc. 
or take what they already have for possibly a run-time speed hit.

Or maybe a speed gain since the RDBMS can break up the table into different 
"spaces" and run the query simultaniously in different threads.  Of course 
you can do that by hand, but then you are getting further behind what they 
already have.

>I tried briefly to combine JJ's answer with Peter's to find an appropriate 
>niche. (Again, I'm trying to look at the math, not fit and finish, 
>availability of experienced programmers, color of brochure...) For exampe, 
>there could be a class of problems for which the data set is a few 10's of 
>gigs and needs to be operated on as a whole. And that queries are fairly 
>arbitrary and exploratory, not production-oriented. Etc. But I haven't been 
>able to come up with one that doesn't have better characteristics as a 
>distributed system.  Maybe if we define the problem as "and you only have 
>one commodity box to do it on." That's fair. Maybe that's it?  (Then we 
>need to find an "enterprise" with only one box...)

Well, it's not going to be that ("you only have one commodity box").  When I 
said I think you could do what Google is doing with an RDBMS if you really 
wanted to I wasn't thinking of a few commidity boxes.  I was thinking of 
4-10 really enormous boxes (but my understanding was that google uses *lots* 
of computers to do their work, no?).

In other words the RDBMS solution will be much more expensive 
computer/software wise compared to what Google did.

_________________________________________________________________
Your Hotmail address already works to sign into Windows Live Messenger! Get 
it now 
http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview