From: Howard Stearns hstearns@wisc.edu Reply-To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org To: The general-purpose Squeak developers listsqueak-dev@lists.squeakfoundation.org Subject: Re: relational for what? [was: Design Principles Behind Smalltalk, Revisited] Date: Tue, 02 Jan 2007 14:36:22 -0600
Yes, I'm quite serious. I'm asking what kinds of problems RDBMS are uniquely best at solving (or at least no worse). I'm not asking whether they CAN be used for this problem or that. I'm asking this from an engineering/mathematics perspective, not a business ("we've always done things this way" or "we like this vendor") perspective.
<horror story ommited>
Honestly, it just seems to me like someone architected an awful system. I know, for example, some databases (oracle I thought) can span a given DB across boxes etc. with different methods of partitioning (e.g. some tables here, some tables there, foreign keys between them, etc.).
You certainly shouldn't have to be copying data between tables. If nothing better, you could install MySQL everywhere and turn on replication.
Maybe this isn't typical, but it is the architecture that Oracle and its PeopleSoft division pushes on us in their extensive training classes. And it appears to be the architecture discussed in the higher education IT conferences and Web sites in the U.S.
Well, the big companies tend to push the most expensive option, not the best for the data model. In my experience so far, I can think of no case where we accepted what the vendors proposed before some serious threats etc..
Anyway, either the data AS USED fits into memory or doesn't. If it does, then what benefit is the relational math providing? If it doesn't, then we have to ask whether the math techniques that were developed to provide efficient random access over disks 20 years ago are still valid. Is this still the fastest way? (Answer is no.) Is there some circumstance in which it is the fastest? Or the safest? Or allow us to do something that we could not do otherwise?
I still don't think the question has anything to do with "in memory" vs. "not in memory" or "quickest way to access the disk". You can tune your RDBMS to try to cache as much as possible in memory, and then it becomes a contest of: is it faster for me to write all the code to do the joins, etc. or take what they already have for possibly a run-time speed hit.
Or maybe a speed gain since the RDBMS can break up the table into different "spaces" and run the query simultaniously in different threads. Of course you can do that by hand, but then you are getting further behind what they already have.
I tried briefly to combine JJ's answer with Peter's to find an appropriate niche. (Again, I'm trying to look at the math, not fit and finish, availability of experienced programmers, color of brochure...) For exampe, there could be a class of problems for which the data set is a few 10's of gigs and needs to be operated on as a whole. And that queries are fairly arbitrary and exploratory, not production-oriented. Etc. But I haven't been able to come up with one that doesn't have better characteristics as a distributed system. Maybe if we define the problem as "and you only have one commodity box to do it on." That's fair. Maybe that's it? (Then we need to find an "enterprise" with only one box...)
Well, it's not going to be that ("you only have one commodity box"). When I said I think you could do what Google is doing with an RDBMS if you really wanted to I wasn't thinking of a few commidity boxes. I was thinking of 4-10 really enormous boxes (but my understanding was that google uses *lots* of computers to do their work, no?).
In other words the RDBMS solution will be much more expensive computer/software wise compared to what Google did.
_________________________________________________________________ Your Hotmail address already works to sign into Windows Live Messenger! Get it now http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get...