GOODS data structures (was Re: [Seaside] [ANN] SeasideTemplate)
Avi Bryant
avi at beta4.com
Wed Feb 25 02:48:31 CET 2004
I spent some of today doing benchmarking of various data structures
with GOODS. The main run off I ended up being interested in was
Dictionary vs. my new BTree implementation. For small collections you
definitely want Dictionary. For large collections, what it comes down
to is how big your transactions are - there's an inherent overhead to
using a Dictionary (because it has one huge array of associations), and
if your transactions are fairly small that can have a huge cost,
particularly when inserting.
Here are some times for transactions involving retrieving or inserting
various numbers of items into a Dictionary or a BTree in a GOODS
database already loaded with 50,000 string keys. They all also include
the overhead of starting a new connection to GOODS.
Dictionary
- retrieve 1: 6s
- insert 1: 16s
- retrieve 100: 9s
- insert 100: 24s
- retrieve 1000: 45s
- insert 1000: 82s
BTree
- retrieve 1: 1.5s
- insert 1: 1.5s
- retrieve 100: 22.5s
- insert 100: 13.5s
- retrieve 1000: 143.5s
- insert 1000: 49s
My guess is that the BTree would show a greater advantage for remote
databases (this was connecting to a DB on localhost), since the
Dictionary has to transfer more bytes. OTOH, it would also be
interesting to try a Dictionary with one of the LargeArray
implementations, which would solve the major problem.
I did do some testing with Scott Crosby's SkipList implementation, but
in general it showed similar characteristics to the BTree with a higher
constant factor. I'll test it more when I have more time.
Incidentally probably the most important thing I found out was that
allocating new objects in GOODS one by one is horrendously slow;
there's an undocumented bulk allocation mechanism that I'll need to add
support for ASAP.
Avi
More information about the Seaside
mailing list