Modules

Sat Feb 26 06:43:57 UTC 2005

Colin Putney wrote:

> Hi folks,
>
> Here's that separate post on the Monticello-redesign I promised. 
> Couldn't manage it yesterday. Dan, once it's sunk in a bit, I'll 
> rework this into a summary for the modules team. (Note: MC stands for 
> Monticello. MC1 is the version currently on SqueakMap, MC2 is the 
> experimental version, which I expect to replace MC1 eventually).
>
> As before, one of the key features of Monticello is that it captures 
> enough development history to allow divergent branches of development 
> to be safely and easily merged. The new thing in MC2 is that it keeps 
> separate version history for each program element (classes, methods, 
> instance variables etc) rather than for entire packages.
>
> This gives us a lot more flexibility about how to group elements for 
> versioning. Where MC1 is tightly bound to packages with sharp 
> boundaries between them, MC2 is happy to work with just about any 
> group of elements a developer decides he's interested in. I've been 
> using the term "tag" for this - conceptually, program elements such as 
> classes, methods, instance variables, globals and so on are annotated 
> with tags, and when you take a snapshot of the tag, all the elements 
> with the tag are included in it. This will allow us to do some handy 
> stuff:
>
> - Package-oriented versioning, similar to Store or MC1. This works 
> quite nicely for well-contained applications.
>
> - Task-oriented versioning, similar to ChangeSets, but versionable and 
> mergeable. I could post a change set to the list, others could take it 
> in different directions and I'd be able to safely merge the results 
> back into a single change set.

I have worked on something similar, but instead of keeping them locally, 
versioning means putting them in the central repository as well. That 
way, people can browse even each other's work in progress (even if it's 
versioned, it does not mean it's releasable), no need to post 
attachements to the list, other people can take them in different 
directions even before you are ready :). More seriously though, this 
could be a very good way to collect patches (changesets are often 
patches) for release in the update stream. One would only need to point 
to the approved patches in the repository.

Concretely, there was a special, unversionable package, containing other 
special, unversionable packages, one for each developer (and named after 
them). All of these task-oriented versionable changesets, when versioned 
(when you first version it you also have to name it), appear in your own 
package. And the brwosers in the image have incorporated support for 
adding things to these changesets, either to the current (unnamed yet) 
one or to a named one.

>
> - Robust update streams. We could automatically detect conflicts 
> between updates in the stream or between an update and local changes, 
> and easily resolve them.
>
<snip>

>> They also have the notion of
>> (stackable) overrides, so if in your package you change a method from
>> a different package, you can easily browse both the override and the
>> overriden code, but most importantly you can safely unload your
>> package and things are restored properly.
>
>
> We've supported overrides in MC1 for a while now, and AFAICT, they're 
> more trouble than they're worth. That may be partly because 
> PackageInfo makes it ugly to implement, but I think there are semantic 
> issues as well.
>
> Overrides imply a fixed load (and unload) order, and more subtly, a 
> version-specific dependency. The overridden method has to keep working 
> from the other package's point of view, and that gets really difficult 
> when we've got a stack of overrides. In that case, we've got the 
> expectations of 3 or more packages to satisfy with one method. When we 
> violate package encapsulation that way, we create a really tight 
> version dependency between the packages, which suggests that maybe 
> they shouldn't be separate packages at all. They can't be developed 
> and deployed separately.
> I don't know what's the best way to handle it, but I'm inclined 
> towards just considering it a versioning problem.  If the same method 
> is defined in two packages, we've got two implementations to 
> reconcile, right? With overrides, you resolved it according to the 
> order that the packages were loaded. If we've got the versioning 
> history as in MC2, we can use that information to make a better 
> decision. If one implementation supersedes the other, use that one. If 
> not, you've got a conflict and you let the user resolve it. Instead of 
> choosing the implementation loaded most recently, we choose the one 
> that was written most recently.
>

But overrides dont' exist (at least in Store) as such in packages. They 
are normal extensions (from the point of view of the package holding 
them) that become overrides only when you load that package in an image 
already containing that method. At the same time, they allow you to make 
your package self-sufficient. As you are developing it, no matter what 
else you (or your users) have in the image, you define a particular 
extension consistently with your package, as a "unit of separately 
deployable code". Surely you don't want to depend on another (version 
of  another) package that defines that extension. You also can (should) 
not provide for other packages that may or may not be loaded. The most 
you can hope for is that right after loading your package, it should 
work. If a subsequent load changes things, that's fine, but that's as if 
the user manually modified one of your methods. You cannot protect 
against that. Making multiple independent packages work together is a 
much more difficult problem. The only way that I know how to solve that 
is by testing (and fixing), and I don't think that we should worry too 
much about providing an automatic solution. My personal preference would 
be to show in the Transcript that an override (of a method not in the 
base image) has occured. It does not matter if they are stacked or not, 
simply the fact that you overrode something from a different package 
makes it probable that the overriden package does not work anymore.

<snip>

>> All methods have to be versioned when the class
>> (extension) is versioned, and this can be made automatically, just
>> like classes have to be versioned when the package is versioned. In
>> addition, the versions of methods that belong to a class version can
>> be marked as special when browsing the method version history, just
>> like class versions that belong to a package version can be marked
>> specially. Sorry for the perhaps too low-level details, I just wanted
>> to write things down. And Dan did ask us what we wanted to see in such
>> a system :)
>
>
> I don't see why this is necessary. Is there some semantic effect 
> you're after here, or do classes and packages just provide convenient 
> ways to group program elements together for a snapshot?
>
This is probably just the memory of a frustration with Envy: because it 
stores all these method editions (inluding every time you put a "halt" 
in a method), the noise level is pretty high, so I always wished that I 
could see at a glance, when looking at the list of editions for a 
method, which editions are "real". But even if we have explicit method 
versioning, so the noise is reduced, the most "real" ones are the ones 
associated with the holder's version, because there is an implicit 
minimal testing expectation for versions.For the method version I would 
expect something like a unit-test, for a class, the beginning of some 
functional testing. The expectation is even higher for the package, 
because it usually groups together classes working in tight coupling, so 
the testing done for a package version is more of a functional test, so 
now those methods "really" work. I guess it would be fun to disallow 
versioning if we detect that testing was not performed :) Seriously 
though, it might be interesting if we could link somehow versions to the 
tests performed.

>> Now, method versions are not interesting just for themselves. A
>> different kind of code organization is a patch, or a unit of work
>> which happens after the packaging structure has been defined, and is
>> perhaps cross-cutting through many different packages. This is a
>> changeset in a non-packaged, non-versioned, and limited-collaboration
>> universe. In a versioned, packaged world, the changesets themselves
>> should be versioned entities, and be composed of versioned
>> sub-entities. It is especially for changesets that method-level
>> versioning comes in handy, because here the finer, method-level
>> granularity is needed. If you are forced to create new (entire) class
>> versions for inclusion in a versioned patch, this not only adds noise,
>> but it creates a much higher incidence of merging conflicts.
>
>
> Absolutely, although I think with very fine-grained version history, 
> spurious conflicts aren't really a problem. I've never used Envy, but 
> I understand that it doesn't do merges well. Florin, is that true in 
> your experience? We put a lot of effort into Monticello's merge 
> capabilities, and they're generally pretty good. MC2 should be an 
> improvement.

As Jon has already mentioned, if you follow their prescribed workflow, 
with strict class ownership, the class (extension) owner is a 
serialization point, only the owner can release it into the package, 
therefore (s)he has to review it and merge it. Envy versions know their 
ancestor, so you can easily tell what the other developer changed 
compared to a previously relased version. But there was no three-way 
merge browser, and on occasion it would have made life easier.
The problem is that in very small teams you don't need such a workflow, 
everybody owns everything, and in very large teams, it does not scale. 
At a previous employer, I was part of such a very large team, with 
multiple locations, across the Atlantic. We had some tools to support a 
changeset model (Envy does not have one), but the changesets were stored 
as blobs in Envy, so they were unbrowsable, and they were mutable, so 
you could not tell what ended up being released, either in the 
development image or even in production, because developers would 
continue to work in the same changeset. Because they were mutable, they 
also didn't have any meaningful timestamp. I worked on an improved 
generation of tools, and I made them browsable and versionable. 
Obviously, I was also using them, and I found them very convenient. The 
one thing that I did not get to do was to make them method-level, their 
granularity was at the class (extension) version level, and it was this 
aspect that made conflicts more frequent than necessary: when you 
version one changeset, you have to version its contents as well, so you 
would version the (whole) class (extension); if another developer is 
working on the same class extension, (s)he would also have to version 
it, then when trying to release both changesets, they would appear to 
clash even though they may have fixed different bugs in different 
methods. Mind you, this is not as terrible as it sounds, because it 
would mean an additional pair of eyes would have to check things, and 
even if methods don't conflict directly they may conflict through 
side-effects. I for one am a bit queasy about automatic merging. Even 
when entire classes don't seem to clash they can affect each other 
(obviously if one is a superclass of a class from the other side of the 
merge)
To support our workflow and the class ownership model, I also added 
state like "submitted for approval", "rejected", "approved"... with 
notifications through Envy and email that you needed to do something 
about it. I had not mentioned this part, because I don't think such a 
workflow would work in a loosely-coupled community like Squeak, although 
maybe some form of ownership might not be a bad idea.

<snip>

> Thanks for the insight Florin,
>
> Colin

Thank you for your insightful comments and for your work on MC,

Florin