Persistence VM?

Stephen Pair spair at acm.org
Mon Aug 19 18:12:09 UTC 2002


I'm attempting to build a BerkeleyDB persistence solution for Squeak.  I
had hoped that I could do this without modifying the VM...however, I'm
coming to the conclusion that a few key modifications to the VM would
make life much simpler.  

This as much a brain dump for me to keep track of everything I need to
think about as it is a solicitation for input.  I'm most interested in
hearing about the implementation details from people familiar with
Squeak's interpreter, and about the capilities this VM will provide from
people that have implemented persistence frameworks.

So, here are my thoughts:

First, I would like call this VM the "Persistence VM" or something
similar.  The idea being that it is for anyone needing incremental
object persistence in Squeak.  The hope being that a lot of persistence
implementors could take advantage of what this vm offers.  It will have
performance and space implications, so it should remain separate from
the main VM.  If I follow through on this VM, I plan to maintain it and
have it track the official VM.  I also plan to provide an image/change
set that has the Smalltalk implementation and which tracks the official
Squeak image.  And, I plan to provide a tool that will convert an
existing image to this object memory layout.

Design goal:  To create an alternate Squeak VM for people needing to
incrementally store objects outside of Squeak while minimizing changes
to the current Squeak VM architecture.

What this VM will provide:  

1) the ability to set a flag for any object in the system that will
prevent that object's state from being directly accessed (read or write)

2) the ability to associate a state manager object with any object in
the system

3) any attempt to read or write a slot in an object that has the
"managed state" flag will result in a read or write message being sent
to that object's state manager (or to nil if none has been assigned)

4) if a state manager has, as its first slot, a SmallInteger, a 4 bit
portion of that small integer will be reset to 16r0 whenever that state
manager is sent the read or write message (persistence implementors
could use this fact to age cached objects and implement an approximate
LRU cache management scheme)

5) the ability to transform any object in the system into a "forwarder"
which will transparently forward all messages it recieves to some other
designated object...garbage collection will automatically change all
references to forwarders into references to their target.  A full GC
will eliminate all references to all forwarder objects in memory.  A
Smalltalk>>allObjectsDo: will not visit "forwarder" objects and once an
object is converted to a forwarder, it's state will no longer be
accessible, nor will it be capable of receiving messages (all message
sent to it will go to it's target)...identity comparisons will result in
an identity comparison with the target object...the hash will be the
hash of the target object (thus if the original object is in any hash
sets, you might need to remove and re-insert it, or rehash the set).

6) the ability to transform any object in the system into a "stub"
object.  A stub object does not understand very much and when it
receives a message, will ask another object to load the real object into
memory, transform itself into a "forwarder" object, and then resend the
message to itself (after the stub object has been transformed into a
forwarder)

Required VM changes:

- compact classes will be dropped
- the managed state flag will be one bit taken from the current compact
classes bits 
- the forwarder flag will be one bit taken from the current compact
classes bits
- the object hash will be increased to 15 bits
- the ability to set an object's identity hash from Smalltalk is
needed...this allows hashed sets to be stored externally and then
retrieved without needed to rehash them (a persistence framework would
need to set the identity hash of any object based on what's in
persistent storage)
- object headers will be as follows:
     2-word: base header, class oop
     3-word: base header, class oop, state manager oop
     4-word: base header, class oop, state manager oop, size
- when assigning an object other than nil as the state manager of an
object, if the object has a 2-word header, then a new object of the same
size with a 3-word header is allocated, the contents of the original
object are copied, and the original object is collapsed into a forwarder
(hash bits will be preserved)
- when assigning nil as the state manager of an object, if the object
has a 3-word header, then a new object of the same size with a 2-word
header is allocated, the contents of the original object are copied, and
the original object is collapsed into a forwarder (hash bits will be
preserved)
- the class header of an object being transformed into a forwarder
stores the target of the forwarder in the class header (thus, any object
can be collapsed into a forwarder)
- a primitive for changing an object's class into a Stub class is
needed, this will take any object and convert it's class into a stub
class, and shrink it's size to zero (note, stubs will use the state
manager header to store an object for fetching state from an external
source)
- the primitives and bytecodes that access object state will need to be
modified to check the "managed state" bit, and if set, send a message to
the object stored in the new state manager slot...additionally, the age
bits will need to be reset if the first slot of the state manager is a
SmallInteger
- to accommodate forwarders, message sending will need to check for the
presence of a forwarder, identity comparison will need be aware of
forwarders, and any bytecode optimized primitives will need check for a
forwarder before invoking the corresponding primitive

Base Header in the Persistence VM:

The base header would look like the following:
3 bits  reserved for gc (mark, old, dirty)
15 bits object hash (for Hash sets)
1 bit   state managed
1 bit   forwarder
4 bits  object format
6 bits  object size in 32-bit words
2 bits  header type (0: 4-word, 1: 3-word, 2: forbidden, 3: 2-word)

If an object's state manager is nil and the object is at most 255 bytes,
then an object will need 2 header words.  If a state manager is assigned
to an object, the size of the header grows to 3 words (by copying the
object to a new location and converting the old location to a
forwarder).  If the object is larger than 255 bytes, then the header
will have 4 words.

An analysis of my current working image (which has 623,385 objects in
it) indicates that such changes in the object memory layout will result
in about an 11% increase the amount of memory needed for this set of
objects (about 78% of the objects in my image are small and compact,
another 4.4% are large)

- Stephen




More information about the Squeak-dev mailing list