[ModSqueak] Name space semantics was: Re: Some of my thoughts

Wed Aug 15 18:39:31 UTC 2001

Roel Wuyts wrote:

> 
> B) Another option: every module acts as a factory for all the classes it
> defines, and it holds a number of references to other modules (the ones it
> depends on), and there is a new keyword 'myModule' (with as semantics that
> it refers to the current model) (maybe it's not even needed). For example,
> suppose I implement a module RoelWidgets, and it uses some collections. When
> I use a collection, the code could look like: 'x := (myModule collections
> classFor: #Set) new.'. When configuring my module I can then say that the
> 'collections' reference should point to the Collections module. Regular
> type-checking-like techniques can the be used to ensure that this is
> possible (or we can throw runtime exception if you reference an unexisting
> class).

This seems unnecessarily complicated. Cf. 'myModule collections classFor:
#Set'. We are already in the current module; 'collections' is known in the
current module and can therefore be refd. as 'collections'. Hence myModule
(should really be thisModule, cf. the existing special name thisContext) is
redundant as you indicate. 'classFor: #Set' would be like today writing
(Smalltalk at: #Set) new instead of Set new.

But as a matter of fact, Dan has already thought of many of these things in
his Environments work, and this already has a working implementation in the
current image. The following is a quote of the class comment for
Environment, from the Squeak image. It deserves reading as it is. I have
added headers for clarity:

> Environments are used to provide separate name spaces in Squeak.  Each one
> operates pretty much the same way that the Smalltalk systemDictionary is used
> in a non-partitioned Squeak.
> 
Inheritance:
> Each class has a direct-access environment in which it is compiled.  Its
> environment slot points to an instance of this class, and it is there where
> the bindings of global variables are sought.  The compiler looks up these
> bindings using normal dictionary protocol (at:, etc).  If a binding is not
> found, then the name is looked up in the environment from which that one
> inherits, if any.  In this way a class may be compiled in a context that
> consists of several nested name spaces, and direct reference may be made to
> any of the objects resident in those spaces.
>
Qualified refs:
> Various methods may need to refer to objects that reside in environnments that
> are not a part of their direct-access environment.  For these references, a
> simple global reference,
> Thing
> may not be used, and instead the construct,
> Envt Thing
> must be used.  In this case Envt is a gloabl reference to another environment,
> and the global name, Thing, is sent as a message to that environment.
> 
> Obviously, such a foreign reference cannot be resolved unless the environment
> in question implements a method of that name.  This is how environmental
> variables are exported.
> 
Implementation, speed:
> Each environment has its own unique class.  With this structure, each
> environment can have its own instance-specific messeages to provide access to
> its exported symbols.  Note that this mechanism provides much faster runtime
> access than the Dictionary at: protocol.  Also note that inheritance provides
> a trivial implementation of nested name scope by the same token.
> 
> In the early stages of installing partitioned environments in Squeak,
> interpreted access will be provided in several ways.  To begin with,
> environments will intercept the doesNotUnderstand: message and, if the message
> begins with a capital letter, it will look up the corresponding name using
> #at:, and return the value if found.  A refinement to this feature will be to
> compile an export method on the spot, so that subsequent accesses to that
> variable run much faster.
>
Assigning to imported globals:
> Note that there is no Environmental access pattern analogous to 'Envt Thing'.
> If an implementor wishes to store into environmental variables, he must do so
> by defining, eg, a SetThingTo: method and using a call to that method in his
> code.  We may choose to only allow one certain pattern of access to be
> compiled in any subclass of Environment to enforce some understandable style
> of coding.

My comments:

1. Note that this scheme allows 2 kinds of foreign reference: inherited and
explicitly quailfied. (Inherited is called imported in VW, and Allen W-B
uses a bunch of names for these things.)

Example: X is defined by module B.
inherited: You can write 'X' in module A since it inherits module B
qualified You write 'B X' to access it.

Note that Wirth allowed both kinds in Modula-2 but later removed the
inherited kind from Oberon. The second kind is my favorite too, but the
other is needed for backward compatibility with regular-style Squeak and
Smalltalk. So I would have liked to have only one for brevity and elegance,
but I think we'll need both in practice.

<Andrew: My 'root' proposal was that the 'root' namespace would inherit all
names from all modules. However, I also see the point in having the tools
resolve refs instead.

2. Single inheritance is not very powerful. Specifically, it is too
coarse-grained for what we will need. I know that it will not be enough to
modularize Squeak as we will want (eg. a minimal core, headless, etc.),
unless we decide to rewrite the system with qualified global refs except for
the single inheritance path. And in that case I think we should just as well
not have any inheritance at all.

I propose instead a Self-style solution, where each module has an ordered
list of modules that should be available from within it. If a module should
be inherited/imported, then it is simply marked as such. Cf. Self marking
slots as parents. The ordering handles conflicts.

(I have thought a whole lot about this issue, more so than it may seem. The
Self-style scheme has several advantages, not all of which I'll mention now.
An important one is that it provides a better way of thinking about the
composition of a system than does inheritance.)

3. Regarding the implementation of foreign references.

Convenience:

I think it is better to use eg. export lists or directives like
#exportAllNames, and have a module look up names implicitly. This is more
convenient for the programmer. Also cf. with what follows.

Semantic ambiguity of the two kinds of foreign reference:

Smalltalk usually early-binds all variable references, incl. globals. In the
above solution (export globals via methods in Environment subclasses),
implicit/inherited references are early-bound (at compile time), while
qualified references are late-bound (at runtime).

This results in a semantic ambiguity which I believe we need to avoid. I
think qualified refs should simply also be early-bound.

Thus, the compiler sends these messages at compile time, after e.g. simply
looking for Capitalized unary message names, and then stores the resulting
literal in the CompiledMethod, as with globals today. This preserves the
current, unified semantics of always early-binding variable references.

Speed:

Note that such early-binding also resolves all concerns about access speed,
because the reference is bound once during compilation. I think simple
dictionary lookups will be fine, it will then be the same method used for
resolving names within a name space, which is a Good Thing. Speed is
actually why I originally thought of early-binding, then I discovered the
semantic ambiguity.

In this way, Environments don't have to be classes, and having multiple
'inherited' namespaces no longer causes speed penalties or the need for VM
extensions.

Hmmm...
Although I generally dislike the Module.X syntax like Roel does, I can now
see one advantage. This dot syntax no longer looks like a message send,
instead as a new kind of naming/reference (which in fact is what we
introduce). Indeed, we avoid the exception of resolving foreign refs by
messages, and of some messages being sent early.

This comes down to a trade-off: which do we dislike more, foreign ref
messages sent early (an odd bird indeed), or extending the syntax with the
dot for a new kind of name binding?

In fact in the light of these issues, in the balance I lean toward the dot
solution. It seems to cause some nasty syntactic ambiguity, but we could
just require end-of-message periods to be followed by a white space (ugh). A
scanner hack I made spotted 2 such missing spaces in the whole current
sources (ie. this is not a problem).

Henrik