Daniel Salama wrote:
David,
It seems you have some experience with GOODS. Can you shed some information regarding your experience with it, primarily in terms of performance. All the tests I've run so far indicate to me that performance of GOODS leaves a lot to be desired. However, because of architectural reasons, I find GOODS to be a valuable OODB.
Thanks, Daniel
Your last two sentences sum it up. I'm unwilling to choose a DB based on performance alone until I'm sure that performance is a problem. I run several Seaside+GOODS based applications for internal use by a small-ish company. I don't have detailed stats but I see peak rates of roughly 2-3 requests per second and about 30 concurrent users (whose chores vary widely). Seaside+GOODS remains responsive during these peak times. I had to add a background session reaper to avoid long pauses when users first accessed the app -- look for my code posted on the Seaside list.
In this middle of writing this I see you responded to my post on the other thread. Rather than continuing these discussions separately I hope you don't mind if I merge them since your use cases are beginning to become concrete.
I do need to generate these primary keys (e.g. assign a customer number or an invoice number, etc). Any reference to such suggestions to more efficiently generate primary keys would be greatly appreciated.
If you need sequential numbers, cache them in a class var in the image. If you have multiple Squeak images generating customers there are several strategies you could use: 1) cache small ranges through a reservation system (each image reserves a subrange of the primary keys requiring a DB transaction only to reserve the next range or give up unused reserved values), 2) one image serves as a "customer number server" and the other images communicate with it through a socket (you can get rid of the overhead of GOODS this way). Oracle has an option to use the first strategy...otherwise sequence number generation requires a DB round trip. The first option has the distinct disadvantage of having possible holes (unused #'s) or non-time sequential numbering. This doesn't work for invoicing, for example, but might be fine for customer numbers. Finally, if you don't need numbers (or sequential numbers) then use UUID's. Cees has already mentioned this.
Imagine something like entering a new order. User will need to "find" the customer in the database, searching by phone number, name, address, zip code, previous order number, or a combination of these. If customer is found, then start entering order details and the order line items. At the end, post the order. If the customer was not found, then before entering the order details, the user will need to create the new customer. At any rate, I would imagine a few back and forths. First, the searching of the customer. I think in this regards, GOODS can be pretty good. I bulk loaded 25,000 customer records into a sample GOODS database and the search performance was very good (even though it took several hours for the bulk load process). However, when "committing" the transaction, the application logic would need to be something like: get the next customer number, assign it to the new customer, post the new customer, then get the next order number, assign it to the order, and finally post the order. Again, this is a hypothetical situation. A real life app would be more complex than this. I don't know how GOODS will perform in this situation, specially when you are talking about hundreds of thousands or over a million customers in a database (I have such a need).
Most of this is not a problem with GOODS. I don't see how your problem scales with the number of customers. The operations you described can be done reasonably quickly -- O(nlogn) or better -- even for an arbitrarily large number of customers with careful indexing for searches. The roundtrips for sequence number retrieval/updates doesn't change when you have one customer or a billion. Based on your benchmarks it seems like 0.2 seconds per round trip, no problem for a web app. What would become a problem is, for example, doing bulk updates or reporting on orders or customers. Normally only "open orders" or "recent orders" need to be reported which suggests that you keep indices on these so you don't have to try to loop over all of them. Maintenance of these indices becomes a headache if you have too many of them though. You'll also get in real trouble if you need ad hoc reports (ones where you didn't anticipate the need for an index) on such large numbers of objects. Also the GOODS client caches may cause you memory issues when handling large numbers of objects...again, depends on your needs.
My Seaside+GOODS applications are very focused. Ad hoc reporting is simply not something that I need right now. My data sets are smaller than yours but my access patterns are similar to what you described (sans sequence number generation). GOODS has worked very well for me but these aren't critical applications although they are widely used. Stability has been a non-issue. These applications are used 5 days a week from 9-5. I've haven't had an image crash or lost data during operational hours in the past four months. (A couple weeks after I first deployed I did have an image die. I didn't try any kind of post mortem and it hasn't happened since.) My server peaks around 20-30% CPU usage (Gentoo linux w/ 2.6.10 kernel). I run behind an apache server proxy (also apache serves absolutely all of my static content including images and CSS).
HTH,
David