Layers a la PIE/Us

Sun Mar 10 02:04:03 UTC 2002

I want to propose a design for loadable/unloadable image changes.  It is
based on layers found in PIE[1] and Us[2], and is designed to work on
any objects (eg. code modules, morphic projects, etc).

Definition:
	An image is a graph of objects all reachable from the Smalltalk root
(or more correctly the specialObjectsArray), ie. today's image.  Let's
define a layer as a set of changes to an image, producing a new image. 
Starting from an empty image we can build up any image just by loading a
sequence of layers.  The only restriction is that layerA must be loaded
after layerB if layerA references an object introduced by layerB.  In
this case, layerA would be considered a dependent of layerB and layerB a
parent of layerA.  Let's redefine a layer as a sequence of parent layers
plus a set of changes to objects found in that parent ancestory.  An
image can then be specified by just a small sequence of layers - parent
layers will be loaded automatically.  In fact, since a layer can have
more than one parent, an image can be defined as a single layer.

Structure:
	Each layer contains: a sequence of parent layers, an image segment of
its new objects with outPointers referring to objects in the parent
ancestory, plus a set of changes to objects in the parent ancestory. 
Changes are specified as a layer-pointer to the object to be changed
plus a Smalltalk message to be sent to it.  The message selector and
args are specified as layer-pointers as well.  Smalltalk messages are
used (instead of just specifying which field to change) so we can handle
collections flexibly.  If two independent layers add new objects to the
same collection, we want to be able to load both layers together without
the new objects clashing on the same field.  Also, when doing things
like adding an instance variable we want to do it in Smalltalk so we can
do the appropriate cleanup (reshape existing instances).
	A roots array is not maintained in the layer's new-objects image
segment, instead all objects are addressable by offset from the front of
the segment.  This makes its easy for new layers to reference objects in
existing layers without needing to make them roots.  Addressing objects
by offset implies that the we cannot garbage collect objects within a
loaded segment, we can only garbage collect whole layer segments once
they are no longer used (like when the user change images, see below).

Creation:
	There is always a single active layer (ie. current transaction) that
records all new objects and changes to old objects.  Low-level methods
that change fields and update collections will be made into primitives
that record changes in the active layer.  Like I suggested in a previous
email, instance variables will not be accessed directly they will always
be accessed through primitive accessor methods.  This is needed because
recording changes using inst var indexes is unreliable since layers may
add instance variables independently but then loaded together.

Saving:
	When the user is done making changes in a layer he closes/commits it. 
Closing a layer makes it immutable.  If he doesn't want to close it he
can still start a new layer and leave the previous layer open.  Closing
a layer saves it to the local database under a unique global id.  All
parent layers that it accesses must also be closed.

Loading:
	When the squeak VM starts up it looks for the local database (under
file name "squeak.sdb" or something) and loads the boot image from it. 
The boot image is expected to contain pointers to a "default" image plus
other optional images.  If one of these optional images is specified on
the command line then that image is loaded, otherwise the "default"
image is loaded.
	When a layer loads, it first loads all its parent layers, then it loads
its new-objects image segment, then it executes its set of changes. 
Once all layers are loaded the image layer becomes the active layer.
	The user can easily switch to a different image by sending #load to it.
 This has the same effect as reloading the image from scratch, but since
there are probably many layers in common between the previous image and
the new image, we do not reload a layer if all layers that made changes
to it in the previous image are also loaded in the new image, otherwise
we reload it from the database.  Old unused layers will be garbage
collected.
	The database has an index that is kept in memory that maps layer ids to
file positions where the layers can be found.  If the database does not
contain the desired layer than a remote server is queried, if that
server is down or does not contain the layer than a backup server is
queried and so on.  Only immutable layers are stored on remote servers
to simplify sharing.  The remote servers will be mini Squeak images.  A
layer fetched from a remote server will be written to the local database
for future use.

Manipulating:
	The user will be able to copy, combine, and disect layers into new
layers.  The operations will be non-destructive so they can be used on
closed/immutable layers.  Variables that point to images can be easily
changed to point to new layers, like in the boot image.  The boot image
layer is open (not immutable) (local database layers can be immutable).

Publishing:
	I need to better understand how DNS works before I can elaborate on
this, but in general you just need to get your layers out on a remote
server and let clients know the layer ids.

Collaborating:
	This also needs more thought, but we should be able to make two or more
squeak's share and manipulate the same open layer in real-time.

Miscellaneous:
	I favor eliminating the .sources and .changes files and store the
source string directly with the compiled method.  Actually, I would just
store whitespace markers, tempNames, and comments and reproduce the rest
from decompiled bytecodes.  Storing the source as objects would be
appropriate now that we have layers and the loaded image will be small. 
Also, .changes will not be needed because layers are generally revisions
of previous/parent layers.  For open layers, we can take snapshot of the
layer from time to time to make a history.

What do you guys think?  I will greatly appreciate your criticisms.  I
want to begin implementing something like this soon.

Cheers,
Anthony

Footnotes:
 [1]  I. P. Goldstein and D. G. Bobrow. "A layered approach to software
design" (1980)
<http://www.dolphinharbor.org/docs/PIE%20Layered%20Approach.pdf>
 [2]  Randall B. Smith and David Ungar. "A Simple and Unifying Approach
to Subjective Objects" (1996)
<http://citeseer.nj.nec.com/smith96simple.html>