[ENH][Modules] Delta Modules [was: Another version]

Thu Oct 25 12:05:10 UTC 2001

This is my reply to the first round of postings, I'm trying to catch up.

First, Allen Wirfs-Brock and Göran Hultgren wrote:

> At 09:45 AM 10/23/2001 +0100, goran.hultgren at bluefish.se wrote:
>> ...
>> Object subclass: #Module
>> instanceVariableNames: 'version parentModule neighborModules
>> definedNames exportedNames repository'
>> ...
> 
> This apparently defines the class that is used to model a "module" that is
> loaded into an image. More interesting would be a specification of a
> "module" that was external to an image.  (After all, by the time you've
> loaded a module all the interesting work has already happened).

I take it you are referring to the interesting gray zone when you somehow
need to be able to "reason" about modules but they have not (yet) been
brought into the image, or have been partly brought in--which is what you
are doing just then. Ie. a classic bootstrapping problem.

It is like a cartoon where the protagonist needs to run between two cliffs,
so he builds a bridge in front of him while he is walking across it.

In this system, this gray zone only exists when modules are being brought
into the image. A ModuleInstaller object is responsible for doing this in a
safe and gracefully recoverable manner. It also has the graph traversal
algorithms that compute what should be loaded, in what order, etc. (I can
see how you might want to keep the module in the image without its contents,
it wouldn't be very hard to do.)

> For 
> example, I'm guessing most or all of the above instance variables reference
> other instantiated meta objects of the module system. How do those
> translate externally?.

Besides Module objects there are ModuleReference objects (the latter with a
few simple subclasses). If modules are the nodes of the module graph, these
are the edges that connect modules. The gray zone solution is that
ModuleReferences may be in two states, one referring to a module proper, and
one "preliminary" state where it refers to a module that is not yet loaded,
by its name and version essentially. So the ModRef serves as the bridge that
you build underneath yourself as you go. Also this state can only exist when
a ModuleInstaller is executing.

In this way the parts dealing with the gray zone are contained rather well
(within ModuleInstaller and ModuleReferences)--I realize that this could
become a terrible mess if not done properly, e.g.  with many "is this module
loaded or not" tests sprinkled all over.

> What is the external form of a module reference.

Here is an excerpt from an Actual Module Definition File (tm) as of now, for
Ned Konz's Connectors project. Messages starting with submodule: and
deltaModuleOn: will create ModuleReference objects:

self 
    version: nil;
    submodule: (Module new) name: #Morphs version: nil importNames: false;
    submodule: (Module new) name: #FSM version: nil importNames: false;
    deltaModuleOn: #(Squeak Media Graphics Primitives) alias: nil version:
nil importNames: false;
    deltaModuleOn: #(Squeak Language Core Methods) alias: nil version: nil
importNames: false;
    (... more deltas ...)
    deltaModuleOn: #(Squeak Media Balloon TrueType Support) alias: nil
version: nil importNames: false;
    deltaModuleOn: #(Squeak Language Core Magnitudes) alias: nil version:
nil importNames: false;
    yourself. !

This is simply a set of messages to the module object (self), and there is
no special syntax whatsoever. There are a couple of artifacts from not being
finished yet--all versions are nil here, and (Module new) is a an artifact
of the current code loading, it shouldn't be there.

The module definition and module contents are kept in separate files. Among
other things this makes loading more robust.

> Externally, is definedNames a list of class and global names or is it the
> actual source/object code for the defined entities. How about exported
> names. Is it a list of strings?

If by externally you still mean on file so to speak, then there are
individual chunks that define each name. These also mark whether a name
should be exported. However (trying to guess what have in mind), other
modules will never access these files, they will send messages to the module
objects in the image to ask about their contents etc.

> Note that the bi-directional linkage of submodules and parents mean that a
> particular module may be a submodule of exactly one parent.. This would
> seem to significantly limit the reusability of modules.

No, this only serves to give a module a unique place ("home") and thereby a
unique name. Access/visibility is not related to the parent/submodule
relation, any module can access any other by declaring it an external (used)
module. A strict access/visibility hierarchy would have caused problems for
what we need to be able to do.

>> - Module parameter = Well, perhaps it sounds better if we call these
>> "parameter module" instead. It is an advanced feature that means that
>> the module shas a "valueholder" instead of a reference to a concrete
>> module. We then have to specify a concrete module as a value when we
>> load (or perhaps activate?) the module in question. An example would be
>> to make stuff "pluggable" so that for example you could bind another
>> "Transcript" than the standard one etc.
> 
> How would you externally represent a such a set of module bindings so it
> can be reproducibly loaded.  You don't want to depend upon "doIts" for
> this. Don't you need some sort of configuration module to specify these
> bindings?

You would do it much like you instantiate a parameter to a method when you
invoke it. When a module declares a parameter module it specifies the module
to bind to the parameter (or possibly nil to use a default binding).

>> If we build a module with everything in it (classes and loose methods
>> end tutti) in a big pile we have a problem. The classes are self
>> contained, they are what they are and do not need any more information
>> to be "complete". But the loose methods (read class extensions) need a
>> "home". Ok, so we have to say what class they should go into. But then
>> we are still not clearly defined because - what version of that class
>> are we referring to? We need to say "this loose method should go into
>> class Object version 1.34" to clearly define things. So our "loose
>> changes" to classes outside of our own module needs to be complemented
>> with information about the version of those classes that they refer to,
>> right? Aha! This is exactly what the Delta modules do! They group our
>> "loose changes" that should be applied to classes in other modules and
>> complement them with information about which specific module they should
>> be applied to.
> 
> No, this is wrong!  You are binding too early and hence limiting
> reusability. Consider the classic class extension, adding a isFoo method to
> Object. In most cases, the extension module that defines this does not care
> which version of Object it is extending. The same extension module should
> be usable  by many parent modules.  When assembling a system you will want
> to bind an extension to a particular parent, but not any earlier.

This was just an example. Versions should be given by VersionReference (etc)
objects, which specify a strategy for "binding" to a version upon loading.
This strategy could be anything from "don't care" to specific-to-the-decimal
to "anything >= 1.14" and so on. Otherwise, Göran's example is a good
illustration of why you want DeltaModules.

>> - One thing that might confuse people is that when Henrik (and I) says
>> "module" he means an exact version of a bunch of code. This means that
>> "Morphic 1.23" is ONE module and "Morphic 1.24" is ANOTHER module. In
>> ordinary speech we sometimes talk about "the module Morphic" talking
>> about all/any version of that stuff. This is a naming problem and
>> perhaps there is a better word for "Module" that we should use... A
>> deltamodule is always in reference to a specific module, for example
>> "Morphic 1.23". So when a module has been "published" it should never
>> ever change content of course.
> 
> Most systems would make a terminology distinction between a logical module
> and a particular "version" of the module. Common terminology would be
> "module version", "module revision", "module edition", etc. It's would
> certainly help communications, to choose a term and then to be fastidious
> in making the distinction.

Yes, but in practice people will sometimes use "module" to refer to the
logical one, and sometimes to refer to a specific one, and which it is will
be clear from the context. That's just how people are. But some standard
name to distinguish the two would be good. "Logical module" vs. "module
version" might work.

> 
> 
>> - A Deltamodule inherits from Module and this could perhaps be
>> refactored with a common baseclass as is common in the Composite pattern
>> but I haven't looked into that issue. One fact though: Looking at the
>> code a Deltamodule CAN NOT have neighborModules. Those methods are
>> overridden. So, a Delta module can only be a leaf in the Module tree.
>> Henrik, what about applying the Composite pattern here? Or are there
>> other considerations?
> 

I considered creating a shared abstract class back when I created
DeltaModule, but I think either that or a Composite would make things more
complicated and I don't see any obvious gains that would offset this. But of
course I could be convinced otherwise.

> Why the restriction on neighborModules?  This would seem to imply that a
> class extension may not reference any classes (or globals) that are not
> already known to the parent. This would seem to extreme limit the use of
> deltamodules.

To be strictly accurate, DMs should be able to specify _changes_ to the base
module's neighbor modules. So if your class extensions need access to new
modules you would specify that these should be added to the base module's
neighbor list. Now strictly these aren't neighbors of the DeltaModule but of
the new version of the base module--yep, this is an esoteric and technical
detail.

> Ah, I checked the code! :-) Well, a Deltamodule can not HAVE neighbors,
> instead it defines CHANGES to it's basemodules neighbors. So I think
> your guessed implication is not valid. Phew. :-)
> 

I don't think I've implemented that yet, really--my idea is to refine the
DeltaModule functionality as we need it.

I know there have been many questions about the exact nature of
DeltaModules. Let me rhetorically ask: how many of you really understand the
semantics of change sets? I can't say that I do. There are some truly
strange things that are done sometimes, e.g. when classes are renamed, or
methods reorganized. And still we manage to use change sets on a daily
basis, in fact they're crucial to what we do with Squeak. So I don't think
we need to worry about the most exceptional of technical details. (But that
was more of a general comment unrelated to Allen's question.)

>> ...
>>> Eh, well since a Deltamodule defines a difference between two modules it
>>> should be able to contain a whole new class. But you would probably
>>> rather seldomly have one Module add classes to another module so it will
>>> probably be a rare case.
>>> 
> 
> Again, this seems like the common case, not the rare case.  The module that
> defines the Foo class also wants to add the isFoo method to Object and
> possibly other classes.

Yes, but the question was about adding a new class to e.g. the module of
Object, right? It probably won't be too common, but shouldn't be a problem.

>> - A Delta module can only be a leaf in the module tree and defines the
>> difference between two Modules. (changed methods, added methods, added
>> classes, removed methods or
>> classes etc.)
> 
> So to create a delta module you first must have two "editions" of a module
> and the creation process creates a third edition of the module (necessary
> to contain the child extension reference) plus an extension module?

??? I'm not sure I was able to decode that sentence. This is what you
need/get:

BaseModule x DeltaModule -> Derived "edition" of BaseModule with
modifications applied.

You have a given BaseModule, you define a DeltaModule, you get a derived
edition. All you do is define the DM; I don't know where the extension
module came from.

>> Obviously the Delta modules are the tricky part - but personally I just
>> view them as a "diff" between two modules. I do have a question to
>> Henrik though - the deltamodule refers to the baseModule that is applies
>> to but it doesn't refer to the resulting module when it has been
>> applied. Ok, I admin that the last sentence sounded strange, but let me
>> put it like this: When Delta module Z has been applied to "Morphic 1.23"
>> I can't really say if the result is "Morphic 1.24" or whatever? The
>> source is defined but the target is not. So... I am not even sure if
>> that is a problem, but perhaps it can be? When it comes to conflict
>> detection I mean. Whatever...

As I see it, the "edition" created by applying a DM does not have an
official version number. New versions could only be created by the "owner"
of the base module, but anyone could create DMs for it, eg. as needed for
their own project. Think of a DM as a change set, and remember how Squeak
updates work, with an official update stream with cs numbers: Anyone can
release a cs with modifications to Morphic, but it isn't an official new
version of Morphic, it doesn't even have a number. Only when a cs is
integrated into the update stream is it given a number (cf. the new version
number for the base module) and creates an "official" new version (as
defined by the update stream).

So there is no automatic version number after a DM is applied, the resulting
"edition" is just "the edition of the baseModule that my project uses". If
one needs an official name, release it as part of a "patch" (module) that
has its own name.

> Final thoughts.  I would expect a module system to support two major goals:
> 1) Reusability of modules
> 2) Reproducability of system configurations.
> 
> These goals might appear to be conflicting but need not be. However, you
> have to be very careful in designing a module system to not achieve one of
> the goals at the expense of the other. My sense is that the design
> decisions in this system that support goal 2 will severely interfere with
> goal 1.

I think that the version specifier objects will give developers the
flexibility to match their specific needs, and trade 1 for 2 or vice versa
if needed.

Henrik