I'm wondering if anyone can shed some light as to which of the standard collection classes are "OODB safe"?
By this, I mean that when large collections of objects are stored in an OODB, they will be efficiently accessed via Squeak without having to, necessarily, load the entire collection in memory.
One example of these are, as were recommended earlier to me, the BTree classes by Avi. However, looking at Magma, I see it does have some support for large collections. Also, OmniBase has some collection classes of its own.
But, questions like: at which point does it become important to consider optimized collections? or is it safe to use Dictionaries or is there a more efficient collection? I don't know. Just looking for general advise on proper usability of collection classes and their persistency (sorry for being so vague).
Thanks, Daniel
Daniel Salama wrote:
I'm wondering if anyone can shed some light as to which of the standard collection classes are "OODB safe"?
By this, I mean that when large collections of objects are stored in an OODB, they will be efficiently accessed via Squeak without having to, necessarily, load the entire collection in memory.
One example of these are, as were recommended earlier to me, the BTree classes by Avi. However, looking at Magma, I see it does have some support for large collections. Also, OmniBase has some collection classes of its own.
But, questions like: at which point does it become important to consider optimized collections? or is it safe to use Dictionaries or is there a more efficient collection? I don't know. Just looking for general advise on proper usability of collection classes and their persistency (sorry for being so vague).
As soon as performance laggs ;-) Personally I use BTree's and TreeSet's w/ GOODS in many places. Generally any collection that grows over a couple hundred elements. They have a huge impact on things like "user lists" which are searched quite often. Using a Dictionary for persistence hasn't worked well for me in these cases.
David
David,
It seems you have some experience with GOODS. Can you shed some information regarding your experience with it, primarily in terms of performance. All the tests I've run so far indicate to me that performance of GOODS leaves a lot to be desired. However, because of architectural reasons, I find GOODS to be a valuable OODB.
Thanks, Daniel
On Apr 12, 2005, at 6:43 PM, David Shaffer wrote:
As soon as performance laggs ;-) Personally I use BTree's and TreeSet's w/ GOODS in many places. Generally any collection that grows over a couple hundred elements. They have a huge impact on things like "user lists" which are searched quite often. Using a Dictionary for persistence hasn't worked well for me in these cases.
David
Daniel Salama wrote:
David,
It seems you have some experience with GOODS. Can you shed some information regarding your experience with it, primarily in terms of performance. All the tests I've run so far indicate to me that performance of GOODS leaves a lot to be desired. However, because of architectural reasons, I find GOODS to be a valuable OODB.
Thanks, Daniel
Your last two sentences sum it up. I'm unwilling to choose a DB based on performance alone until I'm sure that performance is a problem. I run several Seaside+GOODS based applications for internal use by a small-ish company. I don't have detailed stats but I see peak rates of roughly 2-3 requests per second and about 30 concurrent users (whose chores vary widely). Seaside+GOODS remains responsive during these peak times. I had to add a background session reaper to avoid long pauses when users first accessed the app -- look for my code posted on the Seaside list.
In this middle of writing this I see you responded to my post on the other thread. Rather than continuing these discussions separately I hope you don't mind if I merge them since your use cases are beginning to become concrete.
I do need to generate these primary keys (e.g. assign a customer number or an invoice number, etc). Any reference to such suggestions to more efficiently generate primary keys would be greatly appreciated.
If you need sequential numbers, cache them in a class var in the image. If you have multiple Squeak images generating customers there are several strategies you could use: 1) cache small ranges through a reservation system (each image reserves a subrange of the primary keys requiring a DB transaction only to reserve the next range or give up unused reserved values), 2) one image serves as a "customer number server" and the other images communicate with it through a socket (you can get rid of the overhead of GOODS this way). Oracle has an option to use the first strategy...otherwise sequence number generation requires a DB round trip. The first option has the distinct disadvantage of having possible holes (unused #'s) or non-time sequential numbering. This doesn't work for invoicing, for example, but might be fine for customer numbers. Finally, if you don't need numbers (or sequential numbers) then use UUID's. Cees has already mentioned this.
Imagine something like entering a new order. User will need to "find" the customer in the database, searching by phone number, name, address, zip code, previous order number, or a combination of these. If customer is found, then start entering order details and the order line items. At the end, post the order. If the customer was not found, then before entering the order details, the user will need to create the new customer. At any rate, I would imagine a few back and forths. First, the searching of the customer. I think in this regards, GOODS can be pretty good. I bulk loaded 25,000 customer records into a sample GOODS database and the search performance was very good (even though it took several hours for the bulk load process). However, when "committing" the transaction, the application logic would need to be something like: get the next customer number, assign it to the new customer, post the new customer, then get the next order number, assign it to the order, and finally post the order. Again, this is a hypothetical situation. A real life app would be more complex than this. I don't know how GOODS will perform in this situation, specially when you are talking about hundreds of thousands or over a million customers in a database (I have such a need).
Most of this is not a problem with GOODS. I don't see how your problem scales with the number of customers. The operations you described can be done reasonably quickly -- O(nlogn) or better -- even for an arbitrarily large number of customers with careful indexing for searches. The roundtrips for sequence number retrieval/updates doesn't change when you have one customer or a billion. Based on your benchmarks it seems like 0.2 seconds per round trip, no problem for a web app. What would become a problem is, for example, doing bulk updates or reporting on orders or customers. Normally only "open orders" or "recent orders" need to be reported which suggests that you keep indices on these so you don't have to try to loop over all of them. Maintenance of these indices becomes a headache if you have too many of them though. You'll also get in real trouble if you need ad hoc reports (ones where you didn't anticipate the need for an index) on such large numbers of objects. Also the GOODS client caches may cause you memory issues when handling large numbers of objects...again, depends on your needs.
My Seaside+GOODS applications are very focused. Ad hoc reporting is simply not something that I need right now. My data sets are smaller than yours but my access patterns are similar to what you described (sans sequence number generation). GOODS has worked very well for me but these aren't critical applications although they are widely used. Stability has been a non-issue. These applications are used 5 days a week from 9-5. I've haven't had an image crash or lost data during operational hours in the past four months. (A couple weeks after I first deployed I did have an image die. I didn't try any kind of post mortem and it hasn't happened since.) My server peaks around 20-30% CPU usage (Gentoo linux w/ 2.6.10 kernel). I run behind an apache server proxy (also apache serves absolutely all of my static content including images and CSS).
HTH,
David
If you need sequential numbers, cache them in a class var in the image. If you have multiple Squeak images generating customers there are several strategies you could use: 1) cache small ranges through a reservation system (each image reserves a subrange of the primary keys requiring a DB transaction only to reserve the next range or give up unused reserved values), 2) one image serves as a "customer number server" and the other images communicate with it through a socket (you can get rid of the overhead of GOODS this way). Oracle has an option to use the first strategy...otherwise sequence number generation requires a DB round trip. The first option has the distinct disadvantage of having possible holes (unused #'s) or non-time sequential numbering. This doesn't work for invoicing, for example, but might be fine for customer numbers. Finally, if you don't need numbers (or sequential numbers) then use UUID's. Cees has already mentioned this.
Interesting. Will look further into this.
Most of this is not a problem with GOODS. I don't see how your problem scales with the number of customers. The operations you described can be done reasonably quickly -- O(nlogn) or better -- even for an arbitrarily large number of customers with careful indexing for searches. The roundtrips for sequence number retrieval/updates doesn't change when you have one customer or a billion. Based on your benchmarks it seems like 0.2 seconds per round trip, no problem for a web app. What would become a problem is, for example, doing bulk updates or reporting on orders or customers. Normally only "open orders" or "recent orders" need to be reported which suggests that you keep indices on these so you don't have to try to loop over all of them. Maintenance of these indices becomes a headache if you have too many of them though. You'll also get in real trouble if you need ad hoc reports (ones where you didn't anticipate the need for an index) on such large numbers of objects. Also the GOODS client caches may cause you memory issues when handling large numbers of objects...again, depends on your needs.
These are good news. Will try to run some simulations on large data sets to see how GOODS behaves.
My server peaks around 20-30% CPU usage (Gentoo linux w/ 2.6.10 kernel). I run behind an apache server proxy (also apache serves absolutely all of my static content including images and CSS).
Would love to learn more on how you (and I've seen other people in the list recently post similar comments on the REST and Seaside thread) have configured Apache to do this for you.
Thanks again, Daniel Salama
Daniel Salama wrote:
My server peaks around 20-30% CPU usage (Gentoo linux w/ 2.6.10 kernel). I run behind an apache server proxy (also apache serves absolutely all of my static content including images and CSS).
Would love to learn more on how you (and I've seen other people in the list recently post similar comments on the REST and Seaside thread) have configured Apache to do this for you.
Activate proxying in your apache ".conf" file (and make sure you have mod_proxy loaded or compiled in):
ProxyTimeout 300 ProxyPreserveHost Off ProxyBadHeader StartBody ProxyPass /seaside http://localhost:8888/seaside ProxyPassReverse /seaside http://localhost:8888/seaside
<Location "/seaside"> Order allow,deny Allow from all </Location>
This is a _very_ basic config. You can get a tutorial style introduction to proxying issues from: http://www.apacheweek.com/features/reverseproxies I have had problems with Comanche behind a proxy when uploading large files. I basically had to disable keep alive: "HttpAdapter keepAlive: false". Everything works great now but I have no idea what the problem really was.
If you want to do things like load balancing you can use some tricks with mod_proxy and the rewrite engine. Here's a good intro: http://wiki.apache.org/cocoon/LoadBalancingWithModProxy
I use this when I want to bring a new image "live" w/o killing existing sessions. Basically I start the new image (listening on a new port) and replace the existing server in my "ALL" list of load balanced servers. The session affinity keeps existing sessions on the old server but creates all new sessions on the new one. Once the old sessions are gone I can stop the old image. Works well although it is pretty manual at this point. I'd like to have a "deploy" script to take care of this but haven't gotten that far yet.
HTH,
David
Hi!
David Shaffer cdshaffer@acm.org wrote:
http://www.apacheweek.com/features/reverseproxies I have had problems with Comanche behind a proxy when uploading large files. I basically had to disable keep alive: "HttpAdapter keepAlive: false". Everything works great now but I have no idea what the problem really was.
Well, uploading and downloading large files is something I plan to fix. I have already fixed some serious shortcomings with upload which relates to SocketStream - FastSocketStream available on SM handles that much better.
But the issue remains that Kom doesn't handle very large files that good, I intend to give KomHttpServer a performance overhaul when 3.8 is finalized. I have already a bunch of improvements lined up. And putting in large file support (as in "don't suck it all into RAM") seems like a good thing.
So now is probably a good time to send me patches for it and I can integrate them too. :)
regards, Göran
goran.krampe@bluefish.se wrote:
Hi!
David Shaffer cdshaffer@acm.org wrote:
http://www.apacheweek.com/features/reverseproxies I have had problems with Comanche behind a proxy when uploading large files. I basically had to disable keep alive: "HttpAdapter keepAlive: false". Everything works great now but I have no idea what the problem really was.
Well, uploading and downloading large files is something I plan to fix. I have already fixed some serious shortcomings with upload which relates to SocketStream - FastSocketStream available on SM handles that much better.
I've been quietly watching FastSocketStream waiting for some more people to indicate that they are using it in production systems. Sounds like it needs to find its way into my image ;-)
But the issue remains that Kom doesn't handle very large files that good, I intend to give KomHttpServer a performance overhaul when 3.8 is finalized. I have already a bunch of improvements lined up. And putting in large file support (as in "don't suck it all into RAM") seems like a good thing.
The problem doesn't seem to be performance related. I believe it is a bug but I don't have the luxury of trying to track it down on a live system. I'll just run w/o keepAlive for now.
So now is probably a good time to send me patches for it and I can integrate them too. :)
Sorry...see above. Maybe some day I'll build an apache proxy on my development machine...then I can actually try to debug the problem.
David
Hi!
"C. David Shaffer" cdshaffer@acm.org wrote:
I've been quietly watching FastSocketStream waiting for some more people to indicate that they are using it in production systems. Sounds like it needs to find its way into my image ;-)
I use if for my HomepageBuilder app at least. And it seems to work fine. :)
But the issue remains that Kom doesn't handle very large files that good, I intend to give KomHttpServer a performance overhaul when 3.8 is finalized. I have already a bunch of improvements lined up. And putting in large file support (as in "don't suck it all into RAM") seems like a good thing.
I wrote the aboce but in fact, KomHttpServer doesn't suck the whole file in when downloading (but uploading still does) but there are other issues there.
regards, Göran
squeak-dev@lists.squeakfoundation.org