Persistence & DTSTTCPW: ZODB clone?

Cees de Groot cg at cdegroot.com
Thu Jan 31 11:08:01 UTC 2002


I was just explaining the ZODB design to a colleague, and it struck me that 
this might be a very useful persistence engine design for Squeak. It's simple, 
straightforward to implement, and we have sample code in Python to, err, get 
inspiration from ;-).

The design of the ZODB is straightforward: on-disk is just a transaction log, 
the index of 'current' objects is kept in memory. It is single user, if you 
want multiple processes to access the same database write a small server (ZEO 
for Zope). If you want multiple threads to access the same database, use a 
semaphore.

When Zope opens the ZODB, it scans the objects and rebuilds the index (that's 
simple: just run through the transaction log, build a dictionary of (oid -> 
offset), and at the end of the transaction log you have the most recent 
versions in the dictionary). If you cleanly exit Zope, the dictionary is 
dumped to disk so you don't need to scan next time. The in-core index is not 
very large: our ZEO server serves a 500Mb database and uses around 15Mb 
memory, which includes roughly 5Mb for basic Python.

The structure (now I'm going to tread dangerous territory here - this is from 
the back of my head, I really should refresh my knowledge from the docs) is 
basically: magic, transaction header, object, object, object, ..., transaction 
header, object, object, object, ... etcetera. Objects have backpointers to 
parent versions, transactions too - this mean you can 'timetravel' to older 
transactions or older object versions (guess how trivial it is to build a Wiki 
on top of that...).

If the accumulated cruft becomes too large, you can close the database, move 
it to database.old, and copy the freshest objects back to a new file (or the 
freshest objects plus a week's worth of history, etcetera).

The fact that you don't need to overwrite existing parts of the db file 
(tricky), that you don't need on-disk indexing (even trickier), and that it 
all performs reasonably well makes it an ideal candidate for a built-in 
persistence mechanism for Squeak, methinks.

The ZODB interface is basically a dictionary of oid->object mappings. IIRC, it 
employs a root object so compacting the database will not only get rid of old 
versions, but also old cruft. For most datastructures, you simply persist 
Python collections (lists, dictionaries) but they have also layered a B*tree 
index thingy on top of Zope so you have better scalability for large 
collections.

Inside Python, Zope employs some metaobject hackery to make persistence 
transparent (every thread has a current transaction and through some sneaky 
tricks wrapped in a Python C module called 'ExtensionClass' every object that 
inherits/mixes in Persistent will register itself with the transaction when it 
is called; it's up to the code to commit the transaction, Zope does this 
naturally when the HTTP request has been dealt with).
 
Would that sound like something useful for Squeak? It has its limitations, for 
sure, but it seems like the simplest way to get a solid persistence engine 
into Squeak, which is a Good Thing I think. It's also a project I'd like to 
tackle...

Regards,

Cees
-- 
Cees de Groot               http://www.cdegroot.com     <cg at cdegroot.com>
GnuPG 1024D/E0989E8B 0016 F679 F38D 5946 4ECD  1986 F303 937F E098 9E8B





More information about the Squeak-dev mailing list