OODB Collections

Wed Apr 13 17:42:50 UTC 2005

Daniel Salama wrote:

> David,
>
> It seems you have some experience with GOODS. Can you shed some
> information regarding your experience with it, primarily in terms of
> performance. All the tests I've run so far indicate to me that
> performance of GOODS leaves a lot to be desired. However, because of
> architectural reasons, I find GOODS to be a valuable OODB.
>
> Thanks,
> Daniel

Your last two sentences sum it up.  I'm unwilling to choose a DB based
on performance alone until I'm sure that performance is a problem.  I
run several Seaside+GOODS based applications for internal use by a
small-ish company.  I don't have detailed stats but I see peak rates of
roughly 2-3 requests per second and about 30 concurrent users (whose
chores vary widely).  Seaside+GOODS remains responsive during these peak
times.  I had to add a background session reaper to avoid long pauses
when users first accessed the app -- look for my code posted on the
Seaside list.

In this middle of writing this I see you responded to my post on the
other thread.  Rather than continuing these discussions separately I
hope you don't mind if I merge them since your use cases are beginning
to become concrete.

>
> I do need to generate these primary keys (e.g. assign a customer
> number or an invoice number, etc). Any reference to such suggestions
> to more efficiently generate primary keys would be greatly appreciated.

If you need sequential numbers, cache them in a class var in the image. 
If you have multiple Squeak images generating customers there are
several strategies you could use: 1) cache small ranges through a
reservation system (each image reserves a subrange of the primary keys
requiring a DB transaction only to reserve the next range or give up
unused reserved values), 2) one image serves as a "customer number
server" and the other images communicate with it through a socket (you
can get rid of the overhead of GOODS this way).  Oracle has an option to
use the first strategy...otherwise sequence number generation requires a
DB round trip.  The first option has the distinct disadvantage of having
possible holes (unused #'s) or non-time sequential numbering.  This
doesn't work for invoicing, for example, but might be fine for customer
numbers.  Finally, if you don't need numbers (or sequential numbers)
then use UUID's.  Cees has already mentioned this.

>
> Imagine something like entering a new order. User will need to "find"
> the customer in the database, searching by phone number, name,
> address, zip code, previous order number, or a combination of these.
> If customer is found, then start entering order details and the order
> line items. At the end, post the order. If the customer was not found,
> then before entering the order details, the user will need to create
> the new customer. At any rate, I would imagine a few back and forths.
> First, the searching of the customer. I think in this regards, GOODS
> can be pretty good. I bulk loaded 25,000 customer records into a
> sample GOODS database and the search performance was very good (even
> though it took several hours for the bulk load process). However, when
> "committing" the transaction, the application logic would need to be
> something like: get the next customer number, assign it to the new
> customer, post the new customer, then get the next order number,
> assign it to the order, and finally post the order. Again, this is a
> hypothetical situation. A real life app would be more complex than
> this. I don't know how GOODS will perform in this situation, specially
> when you are talking about hundreds of thousands or over a million
> customers in a database (I have such a need).

Most of this is not a problem with GOODS.  I don't see how your problem
scales with the number of customers. The operations you described can be
done reasonably quickly -- O(nlogn) or better -- even for an arbitrarily
large number of customers with careful indexing for searches.  The
roundtrips for sequence number retrieval/updates doesn't change when you
have one customer or a billion.  Based on your benchmarks it seems like
0.2 seconds per round trip, no problem for a web app.  What would become
a problem is, for example, doing bulk updates or reporting on orders or
customers.  Normally only "open orders" or "recent orders" need to be
reported which suggests that you keep indices on these so you don't have
to try to loop over all of them.  Maintenance of these indices becomes a
headache if you have too many of them though.  You'll also get in real
trouble if you need ad hoc reports (ones where you didn't anticipate the
need for an index) on such large numbers of objects.  Also the GOODS
client caches may cause you memory issues when handling large numbers of
objects...again, depends on your needs.

My Seaside+GOODS applications are very focused.  Ad hoc reporting is
simply not something that I need right now.  My data sets are smaller
than yours but my access patterns are similar to what you described
(sans sequence number generation).  GOODS has worked very well for me
but these aren't critical applications although they are widely used. 
Stability has been a non-issue.  These applications are used 5 days a
week from 9-5.  I've haven't had an image crash or lost data during
operational hours in the past four months.  (A couple weeks after I
first deployed I did have an image die.  I didn't try any kind of post
mortem and it hasn't happened since.)  My server peaks around 20-30% CPU
usage (Gentoo linux w/ 2.6.10 kernel).  I run behind an apache server
proxy (also apache serves absolutely all of my static content including
images and CSS).

HTH,

David