GOODS data structures (was Re: [Seaside] [ANN] SeasideTemplate)

Avi Bryant avi at beta4.com
Wed Feb 25 02:48:31 CET 2004


I spent some of today doing benchmarking of various data structures 
with GOODS.  The main run off I ended up being interested in was 
Dictionary vs. my new BTree implementation.  For small collections you 
definitely want Dictionary.  For large collections, what it comes down 
to is how big your transactions are - there's an inherent overhead to 
using a Dictionary (because it has one huge array of associations), and 
if your transactions are fairly small that can have a huge cost, 
particularly when inserting.

Here are some times for transactions involving retrieving or inserting 
various numbers of items into a Dictionary or a BTree in a GOODS 
database already loaded with 50,000 string keys.  They all also include 
the overhead of starting a new connection to GOODS.

Dictionary
- retrieve 1: 6s
- insert 1: 16s
- retrieve 100: 9s
- insert 100: 24s
- retrieve 1000: 45s
- insert 1000: 82s

BTree
- retrieve 1: 1.5s
- insert 1: 1.5s
- retrieve 100: 22.5s
- insert 100: 13.5s
- retrieve 1000: 143.5s
- insert 1000: 49s

My guess is that the BTree would show a greater advantage for remote 
databases (this was connecting to a DB on localhost), since the 
Dictionary has to transfer more bytes.  OTOH, it would also be 
interesting to try a Dictionary with one of the LargeArray 
implementations, which would solve the major problem.

I did do some testing with Scott Crosby's SkipList implementation, but 
in general it showed similar characteristics to the BTree with a higher 
constant factor.  I'll test it more when I have more time.

Incidentally probably the most important thing I found out was that 
allocating new objects in GOODS one by one is horrendously slow; 
there's an undocumented bulk allocation mechanism that I'll need to add 
support for ASAP.

Avi



More information about the Seaside mailing list