Persistence & DTSTTCPW: ZODB clone?
Cees de Groot
cg at cdegroot.com
Thu Jan 31 11:08:01 UTC 2002
I was just explaining the ZODB design to a colleague, and it struck me that
this might be a very useful persistence engine design for Squeak. It's simple,
straightforward to implement, and we have sample code in Python to, err, get
inspiration from ;-).
The design of the ZODB is straightforward: on-disk is just a transaction log,
the index of 'current' objects is kept in memory. It is single user, if you
want multiple processes to access the same database write a small server (ZEO
for Zope). If you want multiple threads to access the same database, use a
semaphore.
When Zope opens the ZODB, it scans the objects and rebuilds the index (that's
simple: just run through the transaction log, build a dictionary of (oid ->
offset), and at the end of the transaction log you have the most recent
versions in the dictionary). If you cleanly exit Zope, the dictionary is
dumped to disk so you don't need to scan next time. The in-core index is not
very large: our ZEO server serves a 500Mb database and uses around 15Mb
memory, which includes roughly 5Mb for basic Python.
The structure (now I'm going to tread dangerous territory here - this is from
the back of my head, I really should refresh my knowledge from the docs) is
basically: magic, transaction header, object, object, object, ..., transaction
header, object, object, object, ... etcetera. Objects have backpointers to
parent versions, transactions too - this mean you can 'timetravel' to older
transactions or older object versions (guess how trivial it is to build a Wiki
on top of that...).
If the accumulated cruft becomes too large, you can close the database, move
it to database.old, and copy the freshest objects back to a new file (or the
freshest objects plus a week's worth of history, etcetera).
The fact that you don't need to overwrite existing parts of the db file
(tricky), that you don't need on-disk indexing (even trickier), and that it
all performs reasonably well makes it an ideal candidate for a built-in
persistence mechanism for Squeak, methinks.
The ZODB interface is basically a dictionary of oid->object mappings. IIRC, it
employs a root object so compacting the database will not only get rid of old
versions, but also old cruft. For most datastructures, you simply persist
Python collections (lists, dictionaries) but they have also layered a B*tree
index thingy on top of Zope so you have better scalability for large
collections.
Inside Python, Zope employs some metaobject hackery to make persistence
transparent (every thread has a current transaction and through some sneaky
tricks wrapped in a Python C module called 'ExtensionClass' every object that
inherits/mixes in Persistent will register itself with the transaction when it
is called; it's up to the code to commit the transaction, Zope does this
naturally when the HTTP request has been dealt with).
Would that sound like something useful for Squeak? It has its limitations, for
sure, but it seems like the simplest way to get a solid persistence engine
into Squeak, which is a Good Thing I think. It's also a project I'd like to
tackle...
Regards,
Cees
--
Cees de Groot http://www.cdegroot.com <cg at cdegroot.com>
GnuPG 1024D/E0989E8B 0016 F679 F38D 5946 4ECD 1986 F303 937F E098 9E8B
More information about the Squeak-dev
mailing list
|