[Modules] A proposal for Modules, Packages and Dependencies

Tue Aug 21 10:41:04 UTC 2001

Another installment in the modularity of modularity series, aka let's make
progress (with apologies to the Bard).

    Words fly up, Squeak remains below.
    Discussions without code will never to the image go.

Beside a module scheme, I also supply a core for source code/project/version
management based on dependencies, with a simple file format. I am working on
some code to demo this, but I thought I'd go ahead and post this before that
is ready.

1. A Semantic model for Modules
-------------------------------

This is a proposal for a minimal and yet powerful semantic model for a
Squeak Modules system. It deals with the structure, relationships, and
dependencies of modules. It is based on Dan Ingalls' Environments proposal,
so anything not mentioned here should be taken directly from there. I've
also looked closely at the ModSqueak work. There are many similarities, but
there is a different philosophy at work here, which is Small is Beautiful,
aka less is more.

This deals only with the static aspects of Squeak programs, ie. with the
structure of a program as it is known at the time of definition, i.e. at
compile-time. But as we know this is not a sharp distinction given the
integrated nature of Smalltalk, hence the confusion about what is really
what. Ie. just because some things are performed dynamically, the model is
still strictly static. I have chosen to use the term Modules. This is also
close to the established meaning of the term as in e.g. Modula-2.

Module Structure
----------------

A module 
- may contain zero or more submodules.
- may also take zero or more other modules as "input parameters".
- will of course also have contents proper: classes, globals, etc.

Think of the parameters as imports, ie. other (external) modules that this
module needs to access.

>From this, a module definition has the abstract form

Module (parameters/import list) -> submodule list.

Also, these imports and submodules could be seen as the arguments and
temporary variables of a method, respectively.

Rationale: 
The reason for this format choice is that by combining elements of this
single format you can define any possible structure. (This is a
graph-theoretical result.) In contrast, for example single-path inheritance
places a narrow restriction on how you can structure different modules wrt
each other; you would soon encounter cases which couldn't be handled by such
a scheme. With this format that cannot happen.

A second reason is that this format provides a simple and familiar way of
thinking about module structure: Each module is composed out of submodules,
together forming a tree of nested modules. This also allows for UI tools to
visualize module structures in a straightforward way, either as trees or
indented lists. (Imports corresponds to crossreferences in a module
hierarchy.)

Inside a module, both imports and submodules are accessible by their names.
Thus from code inside the module the two kinds are indistinguishable. The
difference between them is that submodules are considered to be subparts of
the module (ie. the module's definition), whereas parameters are not
considered part of it, but as external and just made available to it. (This
matters in certain cases, e.g. if you would unload a project from the image,
its submodules would also be unloaded as parts of it, whereas imports would
not.) 

Instead of a single path of inheritance for the names available in a Module,
it should be possible to delegate name lookup to any (possibly several)
imported module or submodule, by marking those modules for delegation. (The
present scheme is equivalent to how Self replaces s.i. with its multiple
inheritance/delegation scheme, where submodules correspond to slots and
delegation corresponds to marking a self slot as a parent, except in that
you can delegate also to imported modules.) Simple conflict resolution can
be done by specifying an ordered lookup of submodules and imports (in the
order of declaration).

Dependencies
------------
So this is a powerful scheme for defining the structure and relationships of
modules, but the upshot is that it at the same time defines module
dependencies. This is practically a tautology; a redundant fact: Those
modules that a module needs to access, are also the modules that it depends
on being available.

So to load a module (and resolve its dependecies), you would load it itself,
then its parameter modules/imports, and then load its submodules into it
recursively.

Example: 

Morphic could contain several submodules, like basic morphs (Oval,
Rectangle, Polygon) and pluggable widget morphs (PluggableListMorph,
PlyggableTextMorph, whatever). It would also depend on the graphics
subsystem (BitBlt, Form, DisplayScreen, etc.) and the events subsystem,
neither of which is part of Morphic since they are UI-independent. So:

Morphic has several submodules: BasicMorphs, PluggableMorphs, WidgetMorphs,
WindowMorphs (etc.). It also has imports: GraphicsCore, EventsCore (etc.).

2. A substrate for source code/project/version management
---------------------------------------------------------

This proposal intentionally avoids the details of source code management
including loading and unloading, and versioning schemes. It is meant to
provide a basis for such schemes.

The point of this scheme is to allow any and several schemes for handling
such things as remote and net-based repositories, version management.

Each Package corresponds to a Module. This means that each package will
reside in its own Module, of course. However, we also need to allow packages
to modify other parts of the system--this would be considered as allowed but
bad and discouraged behavior. Cf. SCANS Layers.

Each Module has an annotations dictionary; a general facility that may be
used in various ways.

A module also always has a Repository object, with a standard protocol,
where subclasses implement different repository schemes. The protocol would
have at least name, version, and URL. The URL can code the kind of retrieval
scheme to use beside location info. It would seem useful for submodules to
have parent-relative URLs, to allow easy moving. Note that the URL needn't
be hardcoded but may profitably be generated from name, version, etc.

It would be useful to have a standard hierarchical location scheme that
matches the module structure (i.e. subdirectories<->submodules), which can
be shared by local file schemes and remote storage schemes.

The in-image info for a Module--module composition (submodules + imports),
Repository object, and annotations--would have a corresponding
read/writeable file format. A standardized but also informative filename
seems useful. Hence, not a file named "type" or "0.1", but "Module <name>
<version>.inf" or some such. This file would also hold info about those
files that should be read in to import the Module.

Note: It would be Very Useful to separate well-formed files from naughty
ones that make changes outside their Module. The former can easily be
undone, but the latter cannot easily be undone for the general case, esp.
after other changes to the image.

A sample file format
--------------------

A Module definition would look a bit like a class definition in source
files, ie. it would use ordinary messages sent to the right places. For
extra redundancy one could use XML instead.

An initial "meta-header" or directive indicates the concrete class of the
Repository object to create for this module. This then determines what
storage and versioning scheme will be used to process the rest of this file.

Note that the dependency scheme, if cleverly applied, allows you to even let
your package "depend" on a repository scheme that would be required for
loading a complex package, e.g. with a custom version manager.

(This is just an approximation, I have left out things like the '!' chunk
separators. Also, the details are not at all set in stone.)

    "We should hardcode
    neither the owner module nor how this module is known to the owner
    (ie. name). Here I will use 'self' instead of a name, but see (1) "

    "First the meta-directive. Provide name of Repository class."
    "A better way of doing this?"

self repositoryOfKind: #LocalFileRepository.

    "You can really use any message a module understands."

self comment: 'My sample module'; version: '0.0.1A'.

    "A number (>= 0) of messages specifying the imports. (2) "

    "Here my Repository object would know how to find it, and then use
     its file that corresponds to this one to load it if necessary.

self uses: #(Smalltalk80 Collections) importNames: true.

self uses: #(Squeak Packages UtilityPack)
     version: '1.0' importNames: false.

    "or even be explicit about the repository scheme to use,
     to allow different parts of a projecy to use different schemes:"

self uses: #(Private CompanyX PackageX) importNames: false
     repositoryObject: (CVSRepository server: 'cvs.companyx.com').

    "Then submodules in the same fashion."

self submodule: #LocalSubmodule importNames: false.

    "or:"

self submodule: #RemoteSubmodule importNames: false
     repositoryObject:
        (CVSRepository server: 'cvs.mycompany.com'
                       module: #(ProjectX ModuleB))

"End of example file"

(1) One could either a) execute code within the module object so to speak or
b) use a special variable like thisModule, cf. thisContext, or c) allow the
package to give an alias. In each case a message would look like:

self import: Blah.       "a: from inside"
thisModule import: Blah. "b: pseudovariable"
MyModule import: Blah.   "c: alias"

I have no firm preference as yet, but will use (a) for now.

(2) The examples I've given are just for illustration purpose, and
especially the fancy ones would quite likely be stupid in practice.

---
End of transmission