Adding Modules to Squeak; a modest proposal

Mon Mar 12 14:44:31 UTC 2001

Joerg Beekmann wrote:
> 
> Normally I would have tried to get a bit more working before sharing this,
> but as Modularity is such a hot topic right I thought I'd go ahead and brave
> it. I'll start by saying I believe a modular Squeak is possible within the
> context of the current Squeak community (at least nothing I've seen so far
> tells me otherwise). The attached document is my vision of what the means
> and how it can be done. Small summary is below.
> 
> Joerg

Great job! Some detailed comments below (expanding your attached draft
into the email). Hopefully other people can comment further on Joerg's
original and/or my comments.

-Paul Fernhout
Kurtz-Fernhout Software 
=========================================================
Developers of custom software and educational simulations
Creators of the Garden with Insight(TM) garden simulator
http://www.kurtz-fernhout.com

====== Included Joerg's attachment inline with comments ========

Proposal for adding Modules to Squeak
===========================
Author: Joerg Beekmann <jbeekmann at attglobal.net>
Version: 0.1.0
Date: March 11, 2001

Table of Contents
=================
0) Basic Ideas
I) Basic modules
II) Module versioning
III) Module specific class extensions
IV) Module loading
V) Modules have a standard startUp method
VI) Potential image partitionings
VII) The plan
VIII) The vision && Dream
IXX) Status
IX) Some rational for decision.
X) Ideas considered but rejected
XI) Relationship to existing modular Squeak work

0) Basic Ideas
==============

I) Basic modules:
=================
** The system is partitioned into modules and each class in a module
looks up global symbols in its Module. In this way modules are very much
like Environments. 

Sounds good.

** All modules are maintained in a ModuleRegistry defined in the root
module 'Smalltalk'.

Perhaps 'SmalltalkModule' to avoid any confusion with the existing
Smalltalk SystemDictionary?  This registry might be a class variable for
the root Module class?

You might want to have a hierarchy like this:
  SqueakModule
    SmalltalkModule
      CelesteModule
      SmalltalkCompilerModule
    PythonModule
      PythonCompilerModule
      PythonNewsReader

Are we definitely talking at least one new class per module? (ENVY does
this -- using XYZApplication for XYZ).

** Each module has a name and it would be nice if this were unique in
the Squeak world.

A registry of names maintained at Squeak.org might make a lot of sense.
Apple did this with creator IDs for Macintosh and Newton applications. 
What might make snese is to split creator and appication ID (assuming
creators can then subdivide applications as they desire).

Java uses URLs. How about something like that for naming?

Module_Org_Squeak_Compiler
Module_Org_Squeak_Morphic
Module_Com_Kurtz-Fernhout_GardenSimulator

(Dashes a problem? Also underscores still a problem? Question is, do
class names paralel the module names? How does this relate to multiple
versions?)

** Each module is a subclass of Module and is defined via a special
subclassing message that lists imports and exports.

How about importList and exportList methods that define lists of imports
and exports?  

Example for a compiler module:

  exportList
    ^#(Compiler Parser AssignNode ...)

** Modules declare the symbols they export or may specify to export all
symbols. Exports are a list of Smalltalk symbols, which must be defined
by the module.

Sounds good to me.

** Modules may import any number of other modules. Imports are defined
as a list of ModuleDescriptions that are resolved to modules at symbol
lookup time.

How about:

  prerequisitesList
    ^#((Squeak_Org_Compiler 1.2 1.3) (Squeak_Org_Morphic 3.2))

** I'm assuming lookup performance is not critical. It is done at
compile time and occasionally during runtime.

Sounds OK.

** If a module is imported all symbols that module exports are imported.

Python and Java let you either import all symbols or pick just one
(using a moudle.foo vs. module.* convention).

The latest Python lets you remap names as an option, as in:

import foo as bar

which could be useful for resolving name conflicts, or potentially if
enhancedfor simulatentously importing two different versions of the same
classes for conversions.

** Importing a module DOES NOT import its imports. (Unlike Environments
no hierarchy of modules).

OK.

** Mutual dependence between two modules should be allowed unless we
can't figure out how to load such a cycle (will need to allow this
initially in any case until the image is partitioned).

Good.

** Order of import makes no difference and search order is undefined
(although it makes sense to search locally first).

I don't know. I sort of like a well defined search order to avoid
confusions... I can see how this would be a problem if everything was
put in a common dictionary as opposed to a list of dictionaries.

** Same symbol may not resolve to two values, either in two imports or
locally and in an import (no shadowing). 

OK. Then the last thing loaded gets the value?

** Respect Smalltalk philosophy of allowing user to create inconsistent
image but provide tools to clean up.... undefined, doubly defined etc.

OK.

** Each module has programmatic access to the registry via the message
'registry' and thus all symbols.

I'm a bit confused on the difference between a registry and a module.

II) Module versioning
=====================
** Each module has a versions and earliest compatible version. 

How about a compatible verions list, where a loaded module can be asked
if it is compatible with anything on the list (allowign tree evolution
for a while)? Then modules could list what they are backwardly
compatible with?

There may be an issue here with modules taken over by someone else later
or renamed. 

To an extent, thinking generally, you kind of want to define
compatibility in terms of services where modules help implement a
service. And the reason you load modules is to get that service going.
This decouples the notion of interface (Service) from implementation
(module). However it may be complex to implement this or think it
through.

** Each ModuleDescription has a versions and a flag indicating if that
exact module is required or any compatible module is acceptable.

OK. Perhaps in a list of compatible modules there could be a flag for
exact match, or perhaps wildcards?

** Versions are in three parts major.minor.patch and are Magnitudes

I find it useful for data files to have major version and minor version,
where the assumption is that minor versions are backwardly compatible
(just additions) and major versions may be incompatible (deletions or
changes). I like the patch level.

** When looking up a symbol in the imports look for the module in the
registry. This is a module of the correct name compatible with the
Version requested.

The registry again...

** Two versions of the same package may co-exist in image.

Great! I like this a lot -- especially say if you want to convert from
one data file format supported by a version to another format supported
by another version.

III) Module specific class extensions
=====================================
** Classes may be extended with methods by modules in which they are not
defined. This provides a home for methods such as isMorph.

Like ENVY.

** The methods a class understands consists of the methods defined in it
and its super classes as well as the extensions defined in the module
corresponding to the sender (that is the senders view of the object).

Interesting. Unlike ENVY -- since it does not take in account the
sender. I think this might be a lot of overhead. Is this really needed?
Conceptually, it looks nice, can we think of examples where it is
essential?

** Importing a module DOES import extensions it imports or defines. This
is required to ensure that derivations of imported classes function
correctly. For example module MyViewers imports module MorphicViewers
that imports Morphic.  Module MyViewers uses ObjectView defined in
MorphicViewers to view morphs and non-morphs. Module Morphic extended
Object with the message isMorph module MyViewers needs to import this
extension in order for ObjectViewer >> isViewingAMorph invoked from
module MyViewers to respond correctly. This may seems inconsistent with
not importing classes imported by an imported module but its' not. This
rule simply ensures that objects behave consistently with the way they
behave in an imported module. It does not import additional symbols.

ENVY tries hard to load sets of modules atomically -- so the image does
not get in an incosistent state.

One workaround it to halways have people save their image first, and if
loading fails, have them revert...

** Look for extension methods after failing (does not understand). i.e.
extensions are in addition to existing protocol.

HMM. Possibly. Or woudl extensions be in the regular method dictionary?

IV) Module loading
===================
** If a module description can't be resolved in the image attempt to
load module from VM directory if not present there from the module
directory (see below)

OK.

** Module filenames are of form name-version.smd (webserver-1.2.65.smd)

How does this fit into subclass names?

** If can't find a file matching the module either prompt or throw an
exception depending on if this is a development or runtime image.

OK.

** Modules may be loaded form source or binary. 

OK.

** Binary modules are loaded using the image segment technology. 

We'll see... Obviously that may need to be extended with version number
etc.

I'd love to see a standard way to generate binary modules including live
objects from text (smalltalk code).

** There is support for each module having its own sources file. The
changes file is shared. However it is still possible to combine all
sources.

Neat.

** Binary module files contain their sources file in them (if sources
are included).

OK.

** Include standard pre and post load methods for each module.

Good.

** Squeak modules are shareable. Multiple running VMs may use same copy
of modules.

OK.

** Modules are self-describing; should be possible to have the VM print
a description of a module if it is passed the module name as an argument
(squeak.exe -describe module name).

Good.

** Note a simple "make module" may do nothing more than startup load
modules and save the fully loaded image. This represents a full
distribution. Option to combine sources from the various modules into a
single sources file?

OK.

** Need a central WebDAV based repository of modules & their versions.
Searchable using the self-describing capabilities of modules.

Great. Maybe this coudl be coupled into a module name registry.

** Should allow for signing of modules via same mechanism as signing of
Projects. Are projects subclasses of modules?

Signing sounds liek a good idea.

** Should be able to validate that loaded Modules still as when loaded
(need to has only on classes, methods and byte-codes?

????

** Be able to browse differences between loaded module and source of
module, be that from the web or wherever.

Yes!

V) Modules have a standard startUp method
=========================================
** Modules define a method called run. This is the startup method of a
module.

Good.

** VM takes module description as a parameter and this can include a
path. This path is the module directory

OK. I think paths may take a bit more work to accomodiate complex
instalaltions...

** With WebDAV support this can be an Internet directory so the module
and its requisite modules may be anywhere on the internet.

OK. Again, might want both local and remote paths.

VI) Potential image partitionings
=================================
** Image should be partitioned into functional units not by groups of
like classes.

Please look at http://c2.com/ppr/envy/ for ideas on thinking about
Smalltalk in terms of layers and sections (effectively modules).  The
basic issue there is that classes are nice modules in a way (in that
sense the original Smalltak-80 was very modular, more so than many
contemporay procedural systems), but classes also need to work together
in larger units (layers and sections) which themselves are effectively
modules, and all of these can have multiple versions that need to be
matched up to build a sensible image.

** Should consider breaking some of the larger functional areas into two
modules allowing choice in how complete a system someone wants.
Especially the Kernel and graphics modules.

I think the base Squeak system should be more highly factored than this
eventually... Lots of modules with versions so individual parts can be
patched. Again look at the above -- the lowest layer can have sections
that still need each other but are  minimally independent and
indiviually replaceable.

** I imagine that some standardized support for communication between
modules may be required. Check out Channels as defined by Inferno
(successor to Plan9). For example Kernel should put all events into a
Channel that will have 0 or more consumers unknown to it. Ned Konz has
mentioned other areas where this would be required.

Don't know muich about this. Sounds cool though. Starting to get at a
notion of firewalls between modules...

** The base module should contain a functioning Smalltalk image with
event dispatch, Segment loading capability and standard I/O support.
Should not contain much else, not even a compiler. Everything else
should be loadable on top of this. Given the advertised segment loading
speeds may not even be a need to create a single image?

Like the direction this is going. I think it may take some fine tuning
in the end to see what makes sense -- for example might want network
primitives but not console I/O for better remote tool support.

** Some module can act as image builders, simply startup, load imports,
initialize and save as a single image with a single sources file, no
changes.

Yes, yes, yes!

VII) The plan
=============
** Implement the basic mechanism as described in I.

Good!

** Put all of Squeak into a single module and then experiment with
adding additional modules.

Good way to start. I assume by adding modules you mean refactoring the
base?

** Partition Squeak into multiple modules but import and export all
symbols to all modules. Play with additional modules.

Good. Nice incremental strategy.

** As above but play with restricting the exports of some of the squeak
modules.

OK.

** Implement some more of above especially the ModuleExtensions and
binary loading

OK.

** Get additional feed back and decide if it was all worth it and should
become part of Squeak.

Sounds good.

VIII) The vision && Dream
=========================
** Squeak as a base module and a collection of other modules.

Yes!

** Able to dynamically load all modules required to run a module onto
the base module.

Yes!

** Version requirements are taken care of.

Yes!

** Works for everything from a command line Smalltalk script interpreter
to a full Squeak environment.

Yes!

** Really dreaming in Technicolor I can imagine a base image & VM acting
more like an OS with multiple programs running on them. 

Yes!

And going further, use the same system to allow Python, Java, Scheme and
so forth moduels to be added to Squeak! 

IXX) Status
===========
** Currently I'm very early in all this. I laid out my goals and a rough
feature set as above.  

Good job so far. Requirements are often the best start.

** I've defined the some of the core parts of I) and can replace
Smalltalk with a module (big deal).

Does this mean actual code? Great!

IX) Some rational for decision.
===============================
** Why not Environments? For several reasons. 
** 1) I don't believe the model of inheriting name space along a single
arbitrarily deep chain is the correct approach. Inheritance is both too
complex and insufficiently powerful. The need to resort to compile in
"special references" is an indication of lack of expressive power. What
I mean by too complex is the inheritance chain itself. Especially if it
becomes long.
** 2) I felt more than namespace partitioning was required (loading
unloading, extensions, versions etc.).  
** Others??

I think the two may need to work together.

X) Ideas considered but rejected
================================
** Ability to import only some subset of an Imports exports.

How about one at a time like Python, Java? Why?

** Ability to import a module giving it a different name.

Why?

** Ability to import a symbol under a different name. 

Why?

** Ability to import a module giving all symbols a prefix.

OK.

** Ability to shadow imports.

Please explain.

** Ability to inherit imports (multi level import)

OK.

** A more general 'module path' mechanism. This is the road to hell; at
least I don't know how to do this correctly and so I deferred it. 

OK, but maybe this is needed? Or at least, look in all subdirectories of
this directory?

XI) Relationship to existing modular Squeak work
================================================
** I've looked at environments

OK.

** I'm aware of the image segmentation work and wish to incorporate
this. 

OK. 

** I don't understand how isolated projects work.

OK.

** In general I haven't make up my mind about how Projects fit into all
this. 

Good question.

** I don't know how these image modules will relate to VM modules
(plugins)

Good question.