Steps to Modularity

Mon Mar 15 17:22:00 UTC 1999

[Here are two messages that describe a simple idea I had last week for a simple yet powerful primitive for breaking up the monolithic Squeak image into useful modules.  References to Ted's work allude to his work on external shared Squeak Pages]
----------------------

Folks -

Here's a summary of the idea I had for writing out a subtree of the object memory:

1.  If not already done, do a GC to get rid of any unreferenced objects.

2.  Mark the root (or roots) of the subtree desired.

3.  Do a GC mark pass.  SInce this stops at any marked objects, the subtree will be unmarked.

4.  Now do a marking traversal, starting from our root (or roots)
	Copy every object into a byteArray
	Relocate internal pointers as you go, or after
	Copy terminal external pointers into an outpointer array.

We will end up with a byte array being the subtree in compacted image format, and an array being the outpointers needed to re-install the subtree.

At this point, at least in the single-root case, one could become the root into an ExternalObject consisting of a URL to the byteArray, and the outpointer array.  A subsequent GC would remove the subtree from the image.

To reinstally the subtree, we simply copy the byteArray back into the image, and use the outpointer array to resolve the external references.

It seems to me this would do a great job of storing unneeded classes (they are mono-rooted trees) and projects or morphic worlds, or books.

	- Dan
----------------------

Folks -

Here are my further thoughts on incremental snapshots...

I envision several degrees of snapshotting an object.  The first creates a snapshot object consisting of a binary image patch with an array of outpointers -- this is the work of a new primitive.  The second becomes the root object into its snapshot.  The third would write the binary patch to a file, leaving only the file reference and outpointers behind.  The third form works a lot like a DLL.  The fourth would externalize the out pointers (Ted's work) and store that to a file as well.  This form could be moved from image to image.

These incremental snapshots should work great for projects, pages, and morphic worlds.  The primitive I have in mind dovetails perfectly with Ted's work because he has solved the further problem of externalizing and later resolving the out pointers for transfer to foreign images.

It's not as simple as I said to just point this at all the classes and blow them all out of an image.  This is because the snapshot stub that remains (file ref to image patch and out table) must be encountered by a message send if it is to be properly reinternalized, and this would not happen with a direct ref in the VM (as, say following a class or superclass pointer in message lookup).  So you can't externalize a class for which there are extant instances, because the VM would barf on a message send to one of the instances, and you can't do it for the methodDictionary either for the same reason.  I believe you can do it for an entire class if it has no subclasses and no extant instances, because the only access to it would be at the Squeak message level  Fortunately this is true for many classes in our system.

I want to add a new flavor of message failure to the VM -- something like doesNotUnderstand:, but more like couldNotInterpret: meaning that the VM encountered a foreign object (such as a snapshot stub) in the lookup process.  This would fail gracefully out to Squeak where the lookup would be simulated (and any snapshots internalized) and the message retried or an error posted.  At the very least it would make the VM somewhat more resilient in the face of a compromised class structure.

[There is one wrinkle here -- I am hoping that these faults can be restricted to message sends, but not occur as a result of storage management.  The reason is that if storage management needs access to the class, then classes cannot be snapshotted, but only, eg, method dictionaries.  The class makes a much nicer tree structure as it points to, eg, the class variables and so do the methods].

I believe THEN that we could replace nearly every class in the image with a phase-1 snapshot stub.  This would be a lot of fun.  All the kernel classes would just pop right back in while you were doing it, because they are being used all the time.  You could start up some application to bring back what it needs as well.  Then you would enumerate the remaining phase-1 stubs, forcing them to file as phase-2 stubs, and save the image.  At that point you would have a small image containing exactly the classes needed to run the application, with all the rest in files linked to the image like a cloud of DLLs.  A user who only ran the application should never see a fault.  But if that image tried to do something new that required another class, it would pull it in as needed.  This could allow reasonable post-delivery maintenance of compact applications.

	- Dan

Steps to Modularity - Incremental Snapshots