OODB Storage Options and Performance

Chris Muller chris at funkyobjects.org
Thu Apr 14 21:38:03 UTC 2005


> > I want the Mac to be faster..  I'm wondering if there's some software
> > configuration that's responsible for this.  Have you tried Squeak 3.7?
> 
> I am running Squeak 3.7-5989

Very odd that you're running so much slower.

> > This session will have the fastest access because its a local
> > connection,
> > remote peer connections must go through serialization and network
> > transport.
> 
> Is this going to be faster, equal, or slower than simply using
> #openLocal?

Equal.

> I re-evaluated the code I ran on April 2. Back then, the bulk load of
> 8784 records took 4380 seconds. After some modifications to the script,
> using MagmaOrderedCollection, #openLocal, and turning on WriteBarrier,
> the time now was 1580 seconds. That's almost 3.5 times faster, which is
> pretty good, considering it's a bulk load process.

There is no such thing as MagmaOrderedCollection; I assume you mean a
MagmaCollection.

> I then changed MagmaOrderedCollection to MagmaArray and the time
> improved slightly to 1490 seconds.

Hmm, this sort of proves there is something terribly inefficient with either
your code or the way Magma is being used.  MagmaCollection inserts are the
slowest thing Magma does, MagmaArray puts are considerably faster, so the small
difference you are seeing indicates a lot of time being spent in other places.

> I then tried to access by index
> element 6666 of the repository and it took 28 seconds. That was
> terribly slow.

You've got to profile that man..  That was a perfect opportunity!

This may be a good example of ODBMS transparency not being free.  With MySQL
you do just the "SELECTS" that you need, so there are few or no "extra" reads. 
In the ODBMS system the "selects" are done for you, that's part of the
transparency.  But "transparent" doesn't mean "completely invisible", you have
to be aware of the various things that affect performance, such as the
read-depth of your domain model..  It may be bringing back a half-a-million
objects with that MagmaArray access; if so a ReadStrategy can help you control
that.

It still smell a bloated readSet but I can't tell for sure.  I've mentioned
this at least a couple of times, but I don't see any indication that you've
done any investigation into that.

Take a look at the 'performance' category of MagmaSession.  Try putting some
displays in your load-script, particularly I'm interested in the
#cachedObjectCount.  If that is getting big then try sending #finalizeOids
after every insert or every 4 or 5 inserts if that's faster.  If its still big
then use #cachedObjectCountByClass to investigate why.

Note whether it starts out fast and then slows down.

Do some timings and profiles on a smaller version of the file to determine what
settings will offer the best performance.

These are the realities of using a highly-transparent ODBMS, GemStone included.
 As you've seen, great improvements can sometimes be made by simply tweaking
various things.  You just have to know what to tweak.  To know what to tweak,
you have to profile.

> Finally, I ran the exact same copy of code using MagmaArray, except
> that I changed MagmaArray to BTree. The result was a disappointing
> execution time of 3810 seconds. However, accessing the 6666th element
> now only took 10 seconds.

Interesting, I've actually never used a BTree..

> In my opinion, 10 seconds is still slow for directly accessing an
> object by its index (key in the BTree case). I don't know if the access
> time will be linear. What will happen when I need to access the
> 1,034,768th customer? Today, it only takes me fractions of a second to
> do so in MySQL!

And it takes but a fraction of a second when using either MagmaArray's or
indexed MagmaCollections as well, even with many millions of objects in them. 
But your code is very likely faulting down way too many objects or...
something.  I just can't tell from here.

If you send me the text of a MessageTally, I should be able to offer some
more-specific advice.

> Am I using the wrong collections?

With Magma, I recommend using MagmaCollections.  They are flexible, dependable
and quite fast on read.  They are still somewhat slow on insert but much better
than they used to be.  MagmaArrays are very fast on both read and puts.

 - Chris



More information about the Squeak-dev mailing list