Persistent Segments

Mon Feb 19 19:03:37 UTC 2001

Below is a preliminary design for automatic persistence of Squeak
objects.  You can find the same text on the swiki at
http://minnow.cc.gatech.edu/squeak/1783.  Please feel free to make
comments here or add to the evolving design document on the swiki.  I
hope this turns into a collaborative effort.

Objects are grouped into segments, and segments are grouped into
databases.  A segment can be protected from read or write using security
permissions.  The traditional image is partioned into segments and
stored in a database.  The new image swaps segments in and out of a
database(s) while running.  A segment consists of an array of root
objects, internal objects (objects only reachable from the roots), and a
set of OutPointer proxies that refer to roots of other segments.  An
OutPointer contains the database address, segment number, and root index
of the object it is standing in for.  An OutPointer exists only when one
or more objects in the segment points to an object outside the segment. 
And an object is only in the roots array when one or more objects
outside the segment are pointing in to it.  Every object must belong to
exactly one segment.  New objects are added to the first segment that
points to it (however, objects can move segments).

When an OutPointer proxy receives a message, and the segment it refers
to is not yet loaded, it forwards the message to the database and waits.
 The database, another Squeak with all its segments virtually loaded
into its image, will execute the message on the client's behalf as long
as it does not change any object in any segment.  If a change is
attempted, execution is aborted and the segment of the original object
is returned to the client where the message is re-executed.  If no
change is made (ie. the message is just a query) the execution result is
returned to the client.  If the result is an old object its whole
segment is returned, otherwise, the new result is extracted (using
ImageSegment), converting old object references to OutPointer proxies,
and returned.  During execution, any messages sent to segment roots that
are already loaded at the client are forwarded back to the client, in
case the client has changed the segment in a way that may affect
execution.  Also, any messages sent to OutPointers that refer to another
database are forwarded to that database.  Like before, if execution
attempts to modify any established object, even on the client, execution
is aborted.  This is necessary in case the same execution thread later
attempts to make a change back on the database, in which case the whole
execution needs to be rolled back.  Allowing the database to execute
"queries" avoids unecessary loading of segments in clients while still
allowing messages like 'Smalltalk allImplementorsOf:' to be used.

When a new segment is loaded into the client an incrementalGC is
executed, surviving young objects are move up by the size of the new
segment, the segment is loaded at the end of old space, and its internal
pointers are adjusted for the new memory location.  While adjusting
pointers, pointers to OutPointer proxies that refer to loaded segments
are immediately replaced by the roots they refers to.  All other
OutPointer proxies in the image that refer to the new segment's roots
will become one with them (if this is too expensive then this can be put
off into a low priority process while OutPointer proxies forward
messages to their roots).  

When a new object is first referenced by an established (old) object,
the new object and all virgin objects reachable from it will point to
the established object's segment via its segment header, a new header
word that only new objects will have.  A virgin object is one with no
segment yet, and an established object is one with a segment, young or
old.  Objects loaded with the original segment don't need the extra
segment header because its segment can be determined from its address in
memory.  The orginal segment takes up a contiguous block of memory, and
the first object in that block is always a SegmentHeader that knows the
segment size and points to its roots and outPointers.  A new
specialObjects member, called loadedSegments, points to all the loaded
segment headers and is (binary) searched to find the segment of an old
object.  If a matching segment is not found then we can safely assume
the object was once young and find its segment in its segment header.

Upon commit all segments are decoupled and sent to the database(s) as a
single transaction.  The details of the commit are as follows:  A fullGC
is executed.  Memory is scanned from beginning to end converting
pointers that cross segment boundaries to OutPointer proxies
(decoupling).  Each segment (including all its new objects) is copied
and compacted, freeing the original segment.  They are copied to young
space unless there is enough room in old space as a result of previously
freed segments.  All segments are then compacted together to the
beginning of memory and sent to the database.  Segments can then be
coupled back together as if they were just loaded (or put off to a low
priority process as mentioned before).

A new segment can be created by specifying a set of roots and extracting
out all objects only reachable by those roots (ie. current
ImageSegments).  This creates a block of new objects at the end of old
memory with a new SegmentHeader in front.  All the original roots become
one with the new roots, leaving the original roots and its free children
to be garbage collected.

A special OutPointer proxy called a ClassOutPointer is used for refering
to classes.  This is needed so a client can use his own customized
methods of a well known class on an imported object.  A ClassOutPointer
contains the location of its original class plus its name and all
instance variables.  When a ClassOutPointer is converted to its class,
it first looks in the Smalltalk dictionary for a class of that name with
those instance variables.  If there's no match the orginal class's
segment is loaded and used.  Compact classes and its instances are
always assumed to have the well known structure.

A segment has two identifiers: its original database address (hostName
and port) and its segment number.  For efficiency, mirrors of well known
segments will reside in most databases.  If a mirror is changed, the
change is sent to the original and all mirrors are updated (remember,
you must have permission to change a segment, even a mirror).  Typical
well known segments would include all the classes in a default image
partitioned by system category, and a core segment holding all the
constant objects that reside in the specialObjectsArray (like nil, true,
false, compact classes, etc).  Typically, these well known segments will
be write protected (writeable only by SqueakCentral) and you would make
changes to copies of these segments and submit change sets as usual.  Of
course, you can also make your database directly accessible to the
public with your new segments on it.

Your commited "image" would be shrunken down to a start segment
containing the Smalltalk dictionary with the Processor holding the
suspended active Process and the World containing the saved screen work;
most of the class vars would contain OutPointers.  However, you will
still be able to take a snapshot that will save your work-in-progress
image to a file without commiting any changes to the database.

I really think Squeak needs persistence so we can build a world wide web
of objects, and I don't think this will slow things down too much.

Thanks,
Anthony Hannan