[Modules] C and Smalltalk modules

Anthony Hannan ajh18 at cornell.edu
Mon Nov 12 20:59:48 UTC 2001


Hello all,

	After reviewing the VM plugins and keeping up with the discussion on
modules, I have come up with some different points of view for modules
that I would like to share with you.  In my analysis, I first looked at
how C modules work and how they can be improved, and then extended this
to Smalltalk modules.  In this approach, I am attempting to build a
modular structure that contains C modules at the base and Smalltalk
object modules on top of them.

C Modules

	Currently, the module interface of a C object file (*.o file in unix)
is its symbol table containing names of functions and variables that it
is available to other modules (exporting) and names of functions and
variables it is using from other modules (importing).  All names in all
modules are in one global name space, and any module can define named
function/variables to be imported.  If more than one module defines the
same name, the linker will choose the one it sees first.

Adding Imported Modules

	A C module does not know who will fulfill its imported names, ie. it
has no-sub modules or imported modules.  The imported modules are
specified in the makefile.  This is flexible but in reality the module
programmer usually knows which modules s/he will be importing.  It would
be cleaner if this importing knowledge was kept with the module instead
of spreading it to the makefile.  We can avoid loosing the module
freedom by still allowing any dependent module to override a named
definition of a module it imports.  For example, if MyModule imports the
StandardMath module that defines factorial, MyModule or any module that
uses MyModule may redefine factorial causing even StandardMath functions
that use factorial to use the new definition.  The graph of imported
modules eliminates the need for a makefile, and a particular "program"
configuration becomes just a single module (of course including its
ancestory of imported modules).

Single Relative Name Space and Overriding Names

	All names in the entire ancestory of imported modules are visible to a
particular module, creating a single name space.  However, two
independent modules can define the same name, and they can even be
imported into the same third module as long as the third module does not
use the duplicated name directly.  For example, an AlternativeMath
module may define its own factorial function and MyModule may import
both the AlternativeMath module and StandardMath module.  There is only
an ambiguity if MyModule calls factorial directly (which one is it
calling?).  But it is ok for it to call it indirectly via a function in
AlternativeMath or StandardMath; the appropriate factorial function will
be used depending on which module is calling it (like Self Language's
sender path tie breaker rule).  Also, MyModule may redefine factorial
overriding the factorial functions in both AlternativeMath and
StandardMath modules, so functions in either module that use factorial
will now use the new factorial function.

Adding Super References and Resolving Ambiguities

	When redefining functions/objects we sometimes want to use the old
definition in our new definition.  To do this we reference the old
definition by prefixing the name with the module name where it resides. 
For example, we can call the old factorial function from inside our new
factorial function by writing "StandardMath.factorial".  This prefixing
technique can also be used to resolve ambiguities in a name space (see
above).

Smalltalk Modules

	A Smalltalk module is structured the same as the enhanced C module
described above.  It will name and define a set of objects, and these
objects can reference and/or override named objects in imported modules.
 For example, a HTTPModule may import StringModule and override String
to add a asUrl method to it.  All dependent and ancestor modules of
HTTPModule will see the new class when referencing String, while other
modules (programs) will not.

Self-like Module Behavior

	A new class of objects called Module will hold Smalltalk modules and C
modules.  A module contains a named set of objects (a dictionary) and a
set of imported modules.  A module behaves like a Self Language object;
its named objects are directly accessible by sending names as messages
to it, and if the accessed object is a method the method will execute. 
Also, like in Self, if the name is not found locally, its parents
(imported modules) will be searched, and so on.  This Self like behavior
is equivalent to C behavior after the modules are linked, except that in
C you can only send message to yourself or your imported modules (ie.
all functions are in the same name space).  So in the Slang, we will
limit calls to 'self' only (like we do today).  Even with this
limitation we will have more flexibility than we currently do because of
the multiple inheritance (multiple imported modules). (Another
difference between Self and C is the sender path tie breaker behavior. 
We can implement this in C utilizing mangled names and/or private
(static) functions).
	Smalltalk modules will have the same Self-like behavior, but most of
their named objects will probably be classes and other globals rather
than methods.  A new pseudo-variable named 'environment' will return the
current "program" module of the current process.  The compiler will
lookup globals in the defining module which will always be an ancestor
of the current module (environment).  When a message is sent to a Module
by name instead of via 'self' or 'environment', the named module becomes
the new environment (new name space) for the current process until the
call returns.

Loading

	We want to be able to run any C or Smalltalk module and have it
automatically load what it needs.  The executable 'squeak' will take a C
or Smalltalk module as its first argument (no image file anymore) and a
C or Smalltalk expression as it second argument to be executed once
loaded.  The expression may be omitted in which case main() will be
called if the first arg is a C module or 'Processor activeProcess' will
be continued if the first arg is a Smalltalk module.
	C modules are stored as regular C object files (.o files).  To make
them work with 'squeak' a second .i file that lists imported modules
must assist every .o file (with the same first name).  'squeak' will
have an additional command line option specifying the path of
directories to search for modules.  If the option is not specified then
the environment variable SQUEAKMODULES will be used as the search path. 
If the given module is a C module, 'squeak' will find all imported
modules, link together an executable, and then execute it.
	Smalltalk module files will have the .s suffix and are stored in image
segment format.  The named objects will be the roots, and references to
named objects in imported modules will be the out pointers.
	'squeak' starts a .s module by starting the C module called
SmalltalkModuleLoader.o with 'loadThenExecute( <.s file>, <smalltalk
expression>)' as its initial expression.  This will initialize
ObjectMemory, load the Kernel Smalltalk module, load the specified .s
file, create a new Smalltalk process for the specified smalltalk
expression or fetch the last activeProcess, and finally call
interpret().  Out pointers to objects in modules not loaded yet will
point to proxies that will load the module on demand (like current
ImageSegmentRootStubs), if its a C module it will load like a dll.  The
Kernel Smalltalk module will consist of all the objects that hang off of
the specialObjectsArray minus the Smalltalk dictionary.

Changing

	Like a transaction, changes are made only to the current module
(environment).  Any changes made to objects in imported modules are
treated as overrides by the current module.  This is accomplished by
catching stores to original objects (old objects excluding recently
tenured objects) and remembering the slots old value before replacing it
with its new value.  For example, adding a morph to World (assuming that
the named object, World, resides in an imported/ancestor module of the
current module) will add the World object as a key in the current
module's overideDictionary with the changed slot number and old value as
its value.  When the process changes modules or a new process starts
running under a different module, the current module swaps out its new
values and puts back the old values then the new module swaps in its new
values.

Saving

	Saving the current module (environment) will save its local named
objects plus its overrides to ancestor modules.  It will store the
latter as associations of FieldPaths to values for every overriden slot.
 A FieldPath is a sequence of fields to follow from a global to an
object, in this case, specifying the path to the changed slot.  Also,
any other pointers from objects in the current module to objects in
imported modules will be converted to FieldPaths before saving.  So for
example, saving a current module that added a morph to World will save
the association, #(World submorphs) -> {newMorph. #(World submorphs 1).
#(World submorphs 2) ...}, in its list of overrides.  (The new submorphs
array has references to original morphs in the old submorphs array).

Committing/Migrating

	Rather than saving changes to the current module, changes can be
migrated to its imported modules, making the current module behaves like
a transaction.  The current modules override dictionary can be handed
down to one of its imported modules (or split up among them), also its
new named objects can be handed down as well.

Versions

	Version numbers can be added to module names as in Module.4.12.1, its
filename would be Module.4.12.1.s or Module.4.12.1.o.  When the version
number is not specified the module (file) with the highest 2-digit
version number is used.  In other words, "Module" would retrieve
Module.4.12 but not Module.4.12.1.  The third digit is reserved for
work-in-progress versions.

Simulation, Compilation and Binding

	A C module can either be in real or simulated mode.  When a C function
is called from Smalltalk and the C module is in real mode, the compiler
generates a special bytecode that will call the native C function.  If
the C module is changed to simulated mode, callers of its functions are
re-compiled to do a normal Smalltalk call, and the C module behaves like
a Smalltalk module.
	Because overrides are changed at their original locations (see Changing
above), we can bind at compile time.  For example, a reference to World
will hold the World's association in the compiled method, and a
reference to a C function will hold the native function address in the
compiled method (of course this address will be converted to a FieldPath
when saving it to disk).
	Any C object file can be a Squeak C module, as long as it's self
contained like a library or  has an associated .i file to find other
linked files needed.  To make it run in simulation mode we have to have
an associated .l (sLang) file or .c file (and a C to Slang translator).

Conclusion

	With modules structured as layers of objects we automatically get
transaction behavior, changeset behavior, name spaces, and
component/project behavior, in addition to loadable code libraries. 
Objects, not classes and methods, are the modules' elements, enabling
modules to be components/projects as well as source code groupings.  The
Self behavior gives modules Environment behavior but with multiple
inheritance.  And the close resemblance between C and Smalltalk modules
will allow easier VM maintainance, and, in fact, blurs the lines between
VM and image, making Squeak one big modular system written in two
different languages: C/Slang and Smalltalk.
	Of course, this framework is not much different then Henrik's (we're
all talking about modules), but the advantages I see with this one is: 
	Only one kind of module (no Delta modules); 
	Only one kind of imports (no sub modules and parameter modules); 
	All objects are modularized, not just classes/methods; 
	VM code is included in same modular framework (no separate plugin
framework).

I know I was a bit terse with my descriptions (trying to keep this email
size reasonable).    Please, respond if you would like more explanation.

Anthony




More information about the Squeak-dev mailing list