Brainstorm

Mon Mar 7 14:01:44 UTC 2005

Ok, here is my over-and-over-revised-brainstorm I mentioned. Yes, it is
long but I think it has some thoughts and concepts worth thinking about.
And remember, this is brainstorming and I have edited it so much there
may be inconsistencies. :)

regards, Göran
---------------- 

A brainstormed model based on:
	- Dan's idea of a module being held in a single reference.
	- Some memories from Modules3.3
	- My Namespace implementation

Sitting on the bus going home (funny, I think Namespaces was born that
way too) I was wondering a bit on how this idea of
holding-a-module-in-a-single-place could work. This is just an attempt
at playing with that thought - don't consider it a proposal. I also mix
the concepts a bit, but humour me. :)

Logical reasoning

1. A module is a separately deployable unit of objects, typically
including classes. I think this definition is a good one.

2. Given the above a module is maintained by one or several developers
as a *unit*, then all names (classes typically) it defines can be
considered to all be in the same namespace. In other words there is no
need nor danger of getting a name conflict within the module itself. So
a module could be a namespace on its own, or in other words - there is
no need to have a module defining names in *multiple* namespaces. At
least I have a hard time seeing that need, sure, some scenario could
probably be cobbled up - but hey, seriously? (interested to hear about
that)

3. But can a namespace be populated by more than one module then? For
simplicity, let's say it can't. :) Then we have a one-to-one relation
between namespace and module, but for reasons to unfold I am not
suggesting an IS A relation just yet, because I think the *lifecycle* of
these two beasts are different. :)

4. If we can only hold the module in a single reference, this means all
external references to classes (or other "globals" in the module) must
be late bound (looked up during execution), we can't bind them at
compile time. Practical? I don't think so. For example, what about class
A in module Z that subclasses class B in module W? Anyway, further down
I am proposing something "in between".

5. If we get away with holding a module in a single reference, then
using ImageSegment for a module should work just fine, which would be
slick indeed. Loading ImageSegments is brutally fast and saving them is
pretty nippy too.

Let's say we have a global called LoadedModules. As in Modules3.3, we
take a two-step approach, first you load a module into the image, and
that should always work no matter what! That little fact opens up much
better tool support btw.

Then you "activate" it as a second step which can be more tricky.
Modules3.3 boiled down to performing a batch become operation IIRC
(quite slick if you ask me) and that may very well be the route to go, I
think I will dust off Henrik's code to have a look. Imagine that the
load is simply an ImageSegment load, so it already contains fully
compiled classes and other objects. 

LoadedModules is a Dictionary. A Module has an instvar originalName (I
think it should), but when we load the module we hang it in
LoadedModules under a name we choose (inspired a bit by Craig that names
shouldn't really be "in" the object), typically we choose the
originalName, *if* that name isn't already taken. So *loading* a Module
will *always work*, the only thing that can possibly happen is that the
originalName is already taken - but then we just pick another. Now, a
loaded Module is just hanging there - it doesn't do much more. :)

Ideally we would now after the ImageSegment has been loaded be holding
an instance of Module in a single reference.

But... what about outpointers from the Module? Today all classes have an
instvar called #environment and it points to the SystemDictionary
instance. This is one of the outpointers when creating the ImageSegment.
If you create a tiny ImageSegment of class "Baba" (and its methaclass)
inheriting from Object - you get *at least* these outpointers:
	Object (the superclass of Baba), nil, #Baba, Smalltalk, Object class
and Metaclass (superclass of Baba class, right?)

And when we load the segment back in these things are typically hooked
back up with the "correct" objects in the image. nil and #Baba is
probably quite allright. :) But what about the other guys? We have 3
classes and Smalltalk. The environment instvar feels like we could have
nil'ed before exporting, then when loading a Module we can set it to
something suitable. This would mean that a loaded Module (ignoring
Symbols and nil and other low level objects) would only have a direct
reference to Smalltalk from inside, if we ignore the references to other
classes.

Say we have another global called Namespaces which is also a Dictionary.
You activate a Module by plugging it "in a namespace", and you never
remove it from LoadedModules. So loading a Module gives us one reference
and activating it, yet another reference but we also hook into objects
inside the Module in variois places. So an activated Module would have
two references to itself and a number of references to objects contained
inside it. But when deactivated those references are turned into
something else (on both sides) and one handle on the Module (Namespaces)
is removed. Then the module again is held only in LoadedModules, and to
unload it you would simply do:

	LoadedModules removeKey: #MyModule.
	Smalltalk garbageCollect. ":)"

The key in the Dictionary called Namespaces is the name we designated to
the module as its "activated name". Again, typically the name used when
loading it (which in turn typically is the originalName), but for
various reasons we may choose another one. For example, if I reimplement
the networking module I could call it "Network" (originalName) which is
the same as the old Module, you could load it into your image, but you
would have to call it NewNetwork if you have the old Network already in
there. And then you can either decide to activate NewNetwork in
parallell with Network, thus plugging the module in a new namespace
called NewNetwork, or you could simply plug it in the existing Network
namespace thus effectively replacing the old module with a different
implementation.

So given the one-to-one relation of Module<->Namespace we could have a
Namespace instance having an instvar called #module but also hold the
associations (bindings). But why this extra indirection? Why isn't
Module and Namespace the same thing? Because we want to be able to
create the Namespace instance and even populate it with bindings,
without yet having the module hooked in there or even loaded! And the
need for that is because the other activated Modules have references to
objects in the not-yet-loaded-nor-activated Module. So a Namespace
instance gets a similar role as Undeclared has. We can load and activate
a Module that indeed has a class A that subclasses class B from Module
X, without having loaded or activated module X! class A should then have
a yet "unbound" reference to the superclass, meaning it should try to
refer to a binding in Namespace X, and since there is none there -
create it. Then later, when Module X is activated, we can plug in the
correct value in that binding - class B.

Let us try it. This means that Namespaces (a Dictionary) should in
effect hold Namespace instances which in turn,. each one, refers to a
Module or to a single instance we can call UndefinedModule (similar to
UndefinedObject).

This all is based on keeping the SystemDictionary Smalltalk as is, which
IMHO is a good idea - I would like a new Modules approach to live
side-by-side, just like Monticello does with the other tools - it makes
it so much easier to introduce and thus we don't repeat the Modules3.3
debacle.

Ok, how will textual references to the module and more importantly to
its objects (classes) work?

Most of us don't really know what a textual reference to a global ends
up *being* when compiled. I am not even sure I am grokking it in full,
but I *think* I do. :)

Each global object today is hanging in an instance of Association (or a
subclass thereof) which hangs in the SystemDictionary (Smalltalk). The
association has the name (an instance of Symbol) in the instvar #key and
the #value instvar refers to the object itself. When a textual reference
to a global is compiled the CompiledMethod instance will end up with
having in it a reference to this Assocation instance. This means
CompiledMethods do not refer to the global object directly, but via the
Assocation.

	PointerFinder pointersTo: AFreshNewSubclassOfObject

...will show you 4 hits. The Metaclass of the new class, the Array that
holds subclasses in Object, the Association in Smalltalk and the
ClassOrganizer which is held by the class itself. For each subclass you
create for this new class you get one more hit - since the subclass
refers to its superclass. But you do not get more hits when referencing
your class in methods!

I am not sure if these are all kinds of references we have to a class -
but if it is then perhaps we can introduce a "dynamic binding" which is
resolved fully when both Modules are activated, but is not resolved
fully when only one Module (either side of the equation) is activated.

A "well formed" Module would then not have any direct references to
anything outside it when loaded, instead it contains instances of
DynamicBinding, that in turn contains enough information to resolve
themselves when activated. When we activate a Module we plug it into a
Namespace instance, we take all bindings (defined names) in the Module
and hang them into the Namespace instance - but if there are already one
or more bindings there we simply put the value into them. For all
DynamicBindings in the Module we resolve them but we also somehow need
to be able to turn them back into DynamicBindings (?). A DynamicBinding
typically has two Symbols - the namespace and the name to bind. So we
just look up the namespace, and if there is a binding in there we use
it, otherwise we create one.

So how is such a DynamicBinding created then? Well, if we use a variant
of my Namespace implementation then all globals in the source would be
of the form "ANameSpace::AName".

In practive the above scheme would mean that:

	Object   "This is compiled and resolved just like before, it refers to
an object in Smalltalk, which is not a Module. Remember - side by side."

	Network::	"This is a reference to a Namespace, and those are held in
the global called Namespaces. So the Compiled looks in there to resolve
it."

	Network::Socket	"This is a reference to a named object in a namespace.
When compiling this the Compiler first finds (or created on demand) the
Namespace instance, then looks in there. If there is a binding we use
it, otherwise we create one that is somehow marked -unbound-. (nil can
not be used, because that is of course a valid value. :))"

Summary

We define a one-to-one relation between Module and Namespace (I think
that is fair) but we do not unify the two concepts - because we want to
be able to create Namespace instances "ahead" and then later plug in a
Module inside it and let the Module populate it with bindings and
reusing the unbound bindings that are already hanging there. So a
Namespace acts almost like a ValueHolder for a Module. And the binding
in there are ValueHolders for objects from the module.

We differentiate between loading a Module and activating it in a
Namespace.

Given the above you should be able load and activate modules with
classes having references to classes/objects in Modules that aren't even
loaded yet. If we totally disregard the "class extension problem" this
means Modules can be loaded and activated in *any order*.

Well, that was my thoughts. :)