Database options was (Re: My first Magma experience ...)

Mon Apr 4 12:19:35 UTC 2005

On Sun, 03 Apr 2005 15:37:29 -0500, Jimmie Houchin <jhouchin at cableone.net>  
wrote:
> Okay. Let me see if I understand you correctly.
> For a simple model, no joins or complex relational modeling, use of a  
> RDBMS is fine maybe preferrable.
>
No joins? Use DBM or some ISAM storage.

Seriously, that 'R' in RDBMS is there for something :)

Joins will pop up all the time, even in the simplest schemas. The only  
pattern I know where a table stands apart is if that table has been  
carefully cut loose from the schema by replicating data into it. Happens  
often to log files (CDR logging in telco applications) and invocing. See  
the reference to Time Travel below for a solution. I never have such  
tables/objects.

> For something in which in a RDBMS you have multiple joins and complex  
> modeling in which to represent or store your data, go OODBMS.
>
Complex modeling is what you want to look at. Subclassing of important  
domain objects is a key indicator - the usual thing is where you have a  
'relation', which can be a person or a company, which can have various  
roles (customer, supplier, employee, employee of customer, maitresse of  
employee of customer, ...); or take for instance the case where you have a  
good argument for implementing time travel patterns  
(http://c2.com/cgi/wiki?TimeTravel, I always will try to use that in  
anything that has to do with logging/ordering/invoicing) - implementing  
this in Smalltalk+ODBMS is essentially 'implement and forget', I'm not  
sure I would like to do this on top of an O/R layer, and probably not  
without either major surgery to the O/R layer or - horrors - adding  
another abstraction layer to the top.

> The price is fine. Is the commercial version on Linux current, working  
> and complete? Or does it have the file locking problems discussed? I use  
> Linux and like Daniel have nothing to do with Windows.
>
The file locking problem is a problem ONLY if you try to access the  
database concurrently from multiple images. There are arguments to be made  
that this is a good idea, but personally I've only have had a use case for  
this only once, and even there it wasn't a very good use case (we split  
the image into application server backend for interactive use and batch  
processing for all the things in the background, simply because the batch  
processing ate lots of CPU time at times and it was easier to let nice+the  
Linux kernel deal with this than to find out how we could do the batch  
processing at a lower priority :)).

I'm not sure whether locking has already been implemented under  
Squeak/Linux - but I know what the issues are, I know how to solve them, I  
know how much time they'll take, so if anyone wants this feature for a  
commercial project, I can implement and test them in under a day's work  
(and so can anyone else who takes half a look at the code - but I'm just  
shamelessly advertising myself as an Omnibase consultant, ok?). So that  
shouldn't be a showstopper for a commercial project ;)

> I really don't know quite to where SQLite scales. But it seems it would  
> scale anywhere Goods or Magma would. Yes? No? Just a thought.
>
Can't comment on that, not having used SQLite. Goods and Magma have  
separate database servers which should help scalability. OmniBase can have  
multiple concurrent images accessing the database, and I imagine that with  
a decent network (Gbit ethernet, separate storage area network, etcetera)  
putting the database on a fast fileserver and having a cluster of images  
access the thing would work quite nicely. If SQLite is indeed just a DLL,  
that limits your scalability to anything you can build yourself (make an  
SQLite database server image, etcetera). Depending on how much you can  
influence, scalability is in your imagination (with SQLite and a typical  
10% write/90% read environment, you could share the database filesystem  
with Linux NBD to other machines and mount it read-only there. Write your  
clients to load-balance read requests, and you suddenly have gained a lot  
along the scalability axis.

With scalability being limited mostly by your imagination, robustness and  
usability become the most important factors in chosing a persistence  
engine. Personally, I never put scalability on the checklist, because  
there are so many tools to solve these kinds of issues - it smells a lot  
like premature optimization.

My current project just assumes a whole lot of peers and everyone of them  
keeping bits of information in plain BerkeleyDB tables. The database is  
the network, in essence. If we don't mess up, it'll scale to awful  
proportions without the need for a single RAID drive :)