Modules

Fri Feb 25 18:50:19 UTC 2005

[Well, it got a bit long, but I figured I should try to articulate my own thoughts about Modules.  I think it's nice at the beginning of the project to have a few "gestalts" laid out (and thanks especially for the others so far), so that there are some anchors for discussion and so that everyone can start to build up their own vision of what's powerful and what's possible.]

Modules in Squeak
I may be wrong, but I feel it is possible to solve several problems at once with a good design for modules.  I also feel that it is possible to keep the modularization nearly invisible to the casual user.

What problems should a module system solve?
It should support the construction of a large system out of many small ones in such a way that the parts and their relationships can be easily managed.

Partitioning
One motivation for this partitioning is to reduce size and with it complexity.  For example by putting many Squeak subsystems out on SqueakMap, the kernel system can be kept small and relatively easy for a novice user to comprehend.

I think it is important to keep in mind the difference between logical and physical partitioning here.  For instance, we might not be concerned with size, but only with apparent complexity, so as not to overwhelm novice users.  In that case, notice that the SqueakMap partitioning could be merged with the browser in such a way that a chock-full demo image could look to the user like a no-frills basic image just by changing a filter setting.  But neither the logical nor the physical partitioning is easy without a supporting architecture, nor possible without a basic paradigm.

Export
Once it is possible to remove and add parts of a system from and to itself, it is natural to consider moving these parts between different systems (ie, images).  Two important examples of this in our current world are the the ability to import a package on SqueakMap into a new release of the system, and the ability to export a project from one system and import it into another.  These examples are also important in juxtaposition -- they use completely different mechanisms to deal with boundaries, references across boundaries, and format of the stored modules.  Down the road there is reason to hope that these two patterns of use would use the same mechanism.

ImageSegments, Extensions, Install and Uninstall
ImageSegments, as many of you know, are a slick piece of squeak technology that allows a well-bounded network of objects to be *very* quickly exported from or imported to, a Squeak image.  Well-bounded, here, means not too many in-pointers (ideally only one), and not too many out-pointers.   Several years ago, we showed that the entire VM construction category, then about 500k of code, could be imported in a tenth of a second using ImageSegments.  However the fact that the system is not already modular has hindered taking real advantage of this technology.

A key issue in partitioning the system is that of "extensions".   By these we typically mean, and I specifically mean here, changes to shared classes that are (or should be) local to a package.  A typical example would be a String method 'asURL' that makes sure it is a well-formed URL.  Such a method would have no place in a system that lacked network support, so it should be a part of the NetworkSupport package rather than part of the base system.  This example is fairly compelling, but even seasoned Squeakers are hard-put to say whether some certain utility methods should be included in the base classes or not.

Install and Uninstall are the processes required to install a package into, or remove it from, a host system.  Typically, installation registers a number of "entry points" or names in various global tables.  Our fileIn process already does much of this through its interning of Symbols and handling of global and pool variables.  Uninstalling involves reversing these changes and others more problematical, such as the extensions to classes outside the package.

A Couple of Desiderata
Much of my recent thinking about modules has been about how to make install and uninstall trivial.  I believe it is possible (and desirable) to make an architecture in which different modules can make conflicting extensions to system classes, and in which the *only* thing required to install a module is to read it in as an imageSegment, and the only thing required to uninstall it is to nil the (only) pointer to it.  A related desideratum is that, having imported a squeak project, however complex, the *only* thing required is to remove all references to it, after which it will be reclaimed by GC, and no trace of it will remain in the image.  When we build a window out of a rectangle, a border width and a color, we just store three pointers;  why should it be any harder to combine three modules?

Isn't this just Packages?
So far, this represents my thinking of about three years ago (except I was stumped about extensions at that time), and it really reads more like "Future Packages".  So the first thing I want to say is that I am in no way critical of all the great ongoing work on packages, both in package design and the effort to split up the Squeak image.  Even if we come up with a better architecture for encapsulating modules in memory (and I hope we do), all of the work done so far will be relevant and useful.  Moreover, I believe it is the case that the better things are partitioned, the easier it is to move from one module architecture to another.

But Wait, There's More
A big shift in my thinking since that time has come from learning about E (see http://www.ERights.org).  E is an architecture for secure distributed computing than can be implemented in a number of languages.  Its real strengths can only be realized in pointer-safe systems, though, which makes Squeak (potentially) a very attractive host system for this architecture.  The essence of E's security is modularity done right.  As Mark Miller says, "security is just extreme modularity".

E in less than a Nutshell
[There is a fine paper called "E in a Nutshell" by Mark Stiegler
	http://www.skyhunter.com/marcs/ewalnut.html ]
The E work is well thought out and profound.  It surrounds a module that they refer to as a vat, which can only be accessed by capabilities.  I am not deeply familiar with their implementation, but I think the best way to think of capabilities in Squeak is as a set of message selectors.  If a module has its own name space for selectors, and if it is impossible to forge a pointer, then if you have a module pointer and a set of selectors, then the *only* access you can possibly have to that module is to send exactly those messages.

The profundity of the E work extends to distributed computing  on multiple hosts with encryption for inter-host communication, and  a process model called "promise pipelining" which is like an efficient lazy publish/subscribe.  They deal with race conditions, unreliable links and the whole nine yards.  Moreover I believe that their model of computation also covers most of the problems of leveraging multiple processors.

A Point of High Leverage
I have no delusions that we can pull off an E system in Squeak in the next six months.  However my interpretation of what a module is, namely a rigorous boundary in the runtime architecture, is exactly the point at which an architecture can be made to support the requirements of E (or not).  The benefits of carrying out this particular piece of work in a manner that is consistent with the requirements of E could be tremendous.  For instance, imagine a 64-bit Squeak with Croquet and an airtight security model.  That is why E is required reading for this project.

	- Dan