Modules

Fri Feb 25 06:27:55 UTC 2005

Hi folks,

Here's that separate post on the Monticello-redesign I promised. 
Couldn't manage it yesterday. Dan, once it's sunk in a bit, I'll rework 
this into a summary for the modules team. (Note: MC stands for 
Monticello. MC1 is the version currently on SqueakMap, MC2 is the 
experimental version, which I expect to replace MC1 eventually).

As before, one of the key features of Monticello is that it captures 
enough development history to allow divergent branches of development 
to be safely and easily merged. The new thing in MC2 is that it keeps 
separate version history for each program element (classes, methods, 
instance variables etc) rather than for entire packages.

This gives us a lot more flexibility about how to group elements for 
versioning. Where MC1 is tightly bound to packages with sharp 
boundaries between them, MC2 is happy to work with just about any group 
of elements a developer decides he's interested in. I've been using the 
term "tag" for this - conceptually, program elements such as classes, 
methods, instance variables, globals and so on are annotated with tags, 
and when you take a snapshot of the tag, all the elements with the tag 
are included in it. This will allow us to do some handy stuff:

- Package-oriented versioning, similar to Store or MC1. This works 
quite nicely for well-contained applications.

- Task-oriented versioning, similar to ChangeSets, but versionable and 
mergeable. I could post a change set to the list, others could take it 
in different directions and I'd be able to safely merge the results 
back into a single change set.

- Robust update streams. We could automatically detect conflicts 
between updates in the stream or between an update and local changes, 
and easily resolve them.

- Maintain the kernel. It should be possible to do "tricky" things in 
MC2, like changing the Compiler, or refactoring Association. They'll 
still be tricky, but at least we won't have to resort to hand-edited 
fileIns.

Avi and I have been working on this on and off for a couple of months 
now. Although we've got the basic versioning engine in place, we're 
still a ways from having a useable app. I'll be demoing it at Smalltalk 
Solutions, so I'm committed to getting it working nicely by late June.

I'll respond to (an abridged version of) Florin's introduction, since 
he brings up a lot of important issues.

On Feb 23, 2005, at 1:47 AM, Florin Mateoc wrote:

> What I like about VW parcels/packages: they encapsulate in a robust
> (from a loading perspective) form an independent piece of
> functionality. Very Smalltakish in the sense that they allow partial
> loading: if it contains methods of a class that is missing, no
> problem, the parcel holds unto the uninstalled methods, and when/if
> the missing class is loaded, it installs the methods. This could be
> extended even to missing superclasses, which would make them
> practically load-order independent.

Yes, I like this too. MC2 takes the same approach (including handling 
superclasses).

> They also have the notion of
> (stackable) overrides, so if in your package you change a method from
> a different package, you can easily browse both the override and the
> overriden code, but most importantly you can safely unload your
> package and things are restored properly.

We've supported overrides in MC1 for a while now, and AFAICT, they're 
more trouble than they're worth. That may be partly because PackageInfo 
makes it ugly to implement, but I think there are semantic issues as 
well.

Overrides imply a fixed load (and unload) order, and more subtly, a 
version-specific dependency. The overridden method has to keep working 
from the other package's point of view, and that gets really difficult 
when we've got a stack of overrides. In that case, we've got the 
expectations of 3 or more packages to satisfy with one method. When we 
violate package encapsulation that way, we create a really tight 
version dependency between the packages, which suggests that maybe they 
shouldn't be separate packages at all. They can't be developed and 
deployed separately.

I don't know what's the best way to handle it, but I'm inclined towards 
just considering it a versioning problem.  If the same method is 
defined in two packages, we've got two implementations to reconcile, 
right? With overrides, you resolved it according to the order that the 
packages were loaded. If we've got the versioning history as in MC2, we 
can use that information to make a better decision. If one 
implementation supersedes the other, use that one. If not, you've got a 
conflict and you let the user resolve it. Instead of choosing the 
implementation loaded most recently, we choose the one that was written 
most recently.

> One thing that you don't have with load-order independece is
> dependency information. Your parcel/package may load but you have no
> clue if it will run. Of course, one could manually check Undeclared,
> look for unimplemented but sent selectors, etc, but I think we could
> offer more tools support for dependecy management. Envy does not allow
> out-of order loads, and the applications' (Envy's packages)
> prerequisites information is enforced only for superclass-subclasses
> and class-extensions relationships (it does not allow you to subclass
> or extend in an application where to the original class is not visible
> (through an explicit dependency declaration)). IMHO dependency
> information is useful, but it should not stop your code-writing
> workflow, nor should it stop you from loading partially. A potential
> solution (for also having dependency information) would be to compute
> (as extensively as reasonably possible) and store the dependecies at
> freezing/versioning time. Since this is a best effort solution, I
> don't think this should require that packages that you are dependent
> on to be also frozen/versioned. Dependency information could be stored
> as "version x" if the dependency is a version, or "version x+" if it's
> been modified since the last time it was versioned (as x). The base
> image is obviously versioned as well.

Agreed. It's important to make the distinction between syntactic and 
semantic dependencies. Envy takes the hard-line so as to guarantee both 
semantic and syntactic compatibility: if package A was developed 
against package B, we know that keeping their versions synchronized 
will ensure that they will work as well together in deployment just as 
well as they did in development. But if we settle for only syntactic 
compatibility, we can get a lot more of the traditional Smalltalk 
best-effort-what-we've-got approach to giving the user/developer 
control.

I'd like to see a package system that can, as Florin suggests above, 
detect syntactic dependencies between packages but doesn't try to 
guarantee semantic compatibility. That's a job for SqueakMap and 
Package Universes. We might even want to regard the recorded 
dependencies as hints about how to resolve syntactic dependencies at 
load time. For example, say we're loading a method that has a reference 
to Foo. If there happens to be a class called Foo, great. If there 
isn't, we look at the package dependencies to figure out where to go 
looking for Foo.

> It is perhaps obvious by now that I consider versioning an important
> feature of any organizational structure. A version is a shareable,
> immutable snapshot, at a finer granularity than the whole image. For
> VW the granularity is the package, for Envy it is the class. If one
> ignores the explicit and named aspects of versioning, Envy's
> versioning granularity is actually at the method level. Each Envy
> method "edition" is an immutable, timestamped snapshot of the method.
> They are created automatically at each "accept", which is what makes
> them impractical for remote servers. This is the main argument in
> Store for their much coarser granularity (less chat), but I think this
> is the wrong approach: latency can be addressed for example with
> background processing, and frequency can be decreased by making method
> versioning a separate, explicit user action.

Yes. This is the conclusion that Avi and I came to with Monticello. 
Fine-grained versioning is enormously powerful, and even Envy doesn't 
take much advantage of it. By requiring an explicit step to save a new 
version, supporting many repository types and making it easy to move 
versions between repositories, I think we can make fine-grained 
versioning work quite well.

> All methods have to be versioned when the class
> (extension) is versioned, and this can be made automatically, just
> like classes have to be versioned when the package is versioned. In
> addition, the versions of methods that belong to a class version can
> be marked as special when browsing the method version history, just
> like class versions that belong to a package version can be marked
> specially. Sorry for the perhaps too low-level details, I just wanted
> to write things down. And Dan did ask us what we wanted to see in such
> a system :)

I don't see why this is necessary. Is there some semantic effect you're 
after here, or do classes and packages just provide convenient ways to 
group program elements together for a snapshot?

> Now, method versions are not interesting just for themselves. A
> different kind of code organization is a patch, or a unit of work
> which happens after the packaging structure has been defined, and is
> perhaps cross-cutting through many different packages. This is a
> changeset in a non-packaged, non-versioned, and limited-collaboration
> universe. In a versioned, packaged world, the changesets themselves
> should be versioned entities, and be composed of versioned
> sub-entities. It is especially for changesets that method-level
> versioning comes in handy, because here the finer, method-level
> granularity is needed. If you are forced to create new (entire) class
> versions for inclusion in a versioned patch, this not only adds noise,
> but it creates a much higher incidence of merging conflicts.

Absolutely, although I think with very fine-grained version history, 
spurious conflicts aren't really a problem. I've never used Envy, but I 
understand that it doesn't do merges well. Florin, is that true in your 
experience? We put a lot of effort into Monticello's merge 
capabilities, and they're generally pretty good. MC2 should be an 
improvement.

> As far as namespaces go, the problem to be solved seems much easier,
> and I think everybody agrees that a heavy-weight solution like in VW
> is inappropriate. I profoundly disliked in VW the fact that they
> namespaced the base image in a lot of small, meaningless namespaces
> (although there was no name conflict to be solved), just as a display
> of what could be done with them. I disliked the fact that namespaces
> were made into as first-class objects as classes, to my mind without
> the same conceptual justification. I also profoundly disliked that
> they now had both namespaces (for the rest of us) and namescopes (for
> the compiler) as two very similar (and with very similar
> responsibilities) class hierachies, but yet distinct.
> I think that the name lookup rules for the compiler should be the same
> as the ones for our code, and I think that the base image should
> contain no namespace other than Smalltalk. There is indeed potential
> for name conflicts when independently developed packages are put
> together in the same image. But if we are only trying to solve this
> issue, and we don't mix it with categorizations (which should be done
> by packges), I would think that a simple rule like "all external
> applications' classes and packages should live in their own (only one)
> external namespace" should be sufficient. This could easily happen
> automatically, with a prompt for an image-wide development namespace
> at the first class creation (like the initials prompt for the
> changeset). Each corporation would have their own namespace and they
> would do all of their development in it. And a few priviledged among
> us like Dan (and maintainers of what is accepted as part of the base)
> would always just type "Smalltalk", and that would be it :)

Agreed. I've already mentioned that I'd like to see a decoupling 
between the structure of the program elements in the image and the 
compiler's binding of names to objects. The Forth strategy sounds right 
to me.

Thanks for the insight Florin,

Colin