Modules instead of Behaviors (was: Generalized Object Modules Design)

Tue Mar 5 21:03:10 UTC 2002

First let me say (I should have said this earlier) that I commend Henrik
for the work he has done on modules.  Modules is a hard problem because
the requirements are fuzzy and complex.  In contrast, block closures was
much easier because it had simple requirements (make blocks independent
from the call stack).  Therefore, you see a large debate around modules
that you do not see around block closures.  But I want to thank Henrik
for getting the ball rolling and implemented!  With that said, here is
what I think a module should look like (this design is better thought
out then my previous ones):

	A module is a group of related methods from many classes that fullfill
a single functionality.  A module is equivalent to a method category
like 'printing' which may cross many classes.  All methods in 'printing'
from all classes would end up in a single module called 'Printing'.
	Note, I do not address loading or maintaining changes to modules in
this design, I only address the structure of a module itself after it is
loaded.  Changesets/transactions/layers are an orthogonal issue that
requires its own design.  Whatever that design turns out to be, it will
be used to load and version module objects.  But I think it is important
to keep these issues separate.  Modules are not repositories, they are
just another meta-object like Class or Behavior, in fact, I am proposing
that Module replace Behavior.
	This design is similar to my first proposal under the "Behavior vs
Modules" thread (which Dave Simmons said was similar to how SmallScript
is implemented).  The main difference is that instead of selectors being
grouped by protocol they are grouped by module which is larger.  A
module is equivalent to the union of all protocols that participate
together to implement the functionality that the module represents.  For
example, the 'Printing' module would contain the Object-printing
selectors like #printOn: & #printString, but it would also contain the
Stream-printing selectors like #print:.
	For every selector in a module there is a SelectorBehavior, a new class
of object that holds every method in the image that implements the same
selector.  It is basically an inside out version of a MethodDictionary. 
A selectorBehavior holds methods keyed by class for a single selector. 
For example, the #printOn: behavior would contain all printOn: methods
keyed by class.
	Grouping methods by selector instead of by class allows us to
distribute a class's methods across modules, making modules the primary
behavior object.  A module encapsulates behavior better than a class
because behavior usually crosses classes.  A class really just provides
automatic dispatching to special methods for a selector but these
special methods are usually related and belong together in the same
module.
	Secondly, grouping methods by selector allows for faster method lookup.
 Only one methodDictionary has to be searched instead of several (cpu
cache effect).  And if there is only one method for the selector and the
method is defined on Object (ie. it is the default method) than we don't
have to do a hash lookup, we just execute the default method; this case
can even be inlined.
	To continue the theme that a class is distributed across modules, class
instance variables are defined solely by primitive methods in modules
instead of by the class.  If a module wants to add an instance variable
to a class it just adds a var primitive method to the accessor selector
under the desired class.  If you want to keep the var private use 'pvt'
in the prefix of the selector.  For example, the var primitive methods
to get and set the private instance variable 'x' would look like:

 pvtX
	"Get x"
	<var: 'x'>

 pvtX: obj
	"Set x"
	<var: 'x'>

(Note, we use the same primitive construct for getter and setter since
the method header distinguishes them.)  Associating a var primitive
method with a class has the effect of adding a slot for that var in the
instance structure of the class.  Modules can add/remove instance
variables from a class independently, without worrying about what other
vars are defined for that class by other modules.  The only restriction
is that two modules cannot define the same var name for the same class.
	Instance vars are only accessible via var primitive methods, meaning we
can no longer use an instance variable name directly in our code,
accessor messages have to always be used.  This is a small inconvenience
when typing but allows greater polymorphism.  I have not tested the
performance hit of always using accessors but it should not be too bad
since the accessors are primitive and we should be able to inline
especially when 'self' is the receiver.
	Another advantage of always using accessor methods is that explicitly
declaring temps (|...|) would not be needed.  Temps and globals will be
the only receivers in a method (besides self, super, and thisContext)
and the only targets of assignments (:=).  So it will be easy to see
which are the temps without explicitly writing them out on top, making
the code look cleaner.  Also, this is a plus for the block closure
compiler because some temps are only used within a block and are better
off defined as block temps.  The BC compiler can place temps in the
right scope without worry about where the user declares them.  Spelling
errors can still be check by comparing them again assignments instead of
against the declaration list.
	Besides a module holding a dictionary of selector->selectorBehavior
pairs, it holds a dictionary of global->value pairs.  These globals will
be used to hold classes, among other constants.  But since a class does
not define its own methods or instance variables all that a class
definition will contain is its superclass.  Like mentioned in the
"Getting rid of metaclasses" thread, metaclasses and class variables are
not needed.
	A module defines a global and selector namespace containing all names
in its dictionaries mentioned above but no others.  A module does not
import/inherit namespaces from other modules, but it can reference a
selector/global in other module just by using its name directly or with
a prefix if necessary.  A selector/global has to be prefixed with its
module name only if its name is ambiguous.  For example, any method in
any module can send #print: without prefixing it if only one module in
the image defines #print:.  However, if module A and module B both
define #print: then a method in a module C would have to write "rcvr
A.print: obj" or "rcvr B.print: obj", but a method in A or B does not
have to prefix it if its calling the #print: in is own module.  In other
words, a message send without a prefix will find the unique selector in
any module but if more that one module defines it then it will find the
selector in the home module, however, if the home module does not define
it then the compiler will raise an exception causing a menu to pop-up to
the user asking him to choose the approriate prefix.  The same procedure
apply to globals.
	To avoid a single global namespace for modules themselves and to
categorize them nicely, modules will sit in a directory structure.  Like
class categories and classes, directories are not modules and modules
are not directories, modules are always leaves in the directory
structure.  To reference a module by name (either to prefix a
selector/global or to get the module object itself) you specify the path
from a common parent (between the module your writing your method for
and the target module).  For example, suppose you have the following
directory structure with modules at the leaves:

 Root
	Collections
		Enumerating
	Morphic
		Stepping
		Displaying

To access the 'Enumerating' module from a method in the 'Stepping'
module you would type "Collections.Enumerating".  To access the
'Displaying' module from the same 'Stepping' module you would type
"Displaying".
	Remember, you only have to prefix a selector/global with it module if
more than one module IN THE IMAGE currently defines it.  The compiler
will convert the name to a hard pointer to the actual selector/global in
a particular module.  So even if a new module is loaded that also
defines the same name, already compiled methods are not affected and
when viewing a compiled method it will automatically add prefixes to
selectors/globals if it needs to disambiguate.

What do you guys think?

Cheers,
Anthony