Modules

Sun Feb 27 02:13:35 UTC 2005

Hi folks,

I'm moving this thread over to the Monticello mailing list. No need to 
clutter squeak-dev with versioning discussions. If you want to sign up 
the list is here:

http://mail.wiresong.ca/mailman/listinfo/monticello

On Feb 26, 2005, at 1:43 AM, Florin Mateoc replied to me thusly:

>> - Task-oriented versioning, similar to ChangeSets, but versionable 
>> and mergeable. I could post a change set to the list, others could 
>> take it in different directions and I'd be able to safely merge the 
>> results back into a single change set.
>
> I have worked on something similar, but instead of keeping them 
> locally, versioning means putting them in the central repository as 
> well. That way, people can browse even each other's work in progress 
> (even if it's versioned, it does not mean it's releasable), no need to 
> post attachements to the list, other people can take them in different 
> directions even before you are ready :). More seriously though, this 
> could be a very good way to collect patches (changesets are often 
> patches) for release in the update stream. One would only need to 
> point to the approved patches in the repository.

Sure, repositories are good. My point was more that you can share 
versioned code in whatever way is convenient. Actually, with Monticello 
you would mail to the list by saving the version to an SMTP repository. 
There are also repositories that use HTTP, FTP, local directories, 
object databases and even SqueakMap. I work out of a public HTTP 
repository, so you can peek at my work any time you like:

http://monticello.wiresong.ca/

[snip]

>> We've supported overrides in MC1 for a while now, and AFAICT, they're 
>> more trouble than they're worth. That may be partly because 
>> PackageInfo makes it ugly to implement, but I think there are 
>> semantic issues as well.
>>
>> Overrides imply a fixed load (and unload) order, and more subtly, a 
>> version-specific dependency. The overridden method has to keep 
>> working from the other package's point of view, and that gets really 
>> difficult when we've got a stack of overrides. In that case, we've 
>> got the expectations of 3 or more packages to satisfy with one 
>> method. When we violate package encapsulation that way, we create a 
>> really tight version dependency between the packages, which suggests 
>> that maybe they shouldn't be separate packages at all. They can't be 
>> developed and deployed separately.
>> I don't know what's the best way to handle it, but I'm inclined 
>> towards just considering it a versioning problem.  If the same method 
>> is defined in two packages, we've got two implementations to 
>> reconcile, right? With overrides, you resolved it according to the 
>> order that the packages were loaded. If we've got the versioning 
>> history as in MC2, we can use that information to make a better 
>> decision. If one implementation supersedes the other, use that one. 
>> If not, you've got a conflict and you let the user resolve it. 
>> Instead of choosing the implementation loaded most recently, we 
>> choose the one that was written most recently.
>>
>
> But overrides dont' exist (at least in Store) as such in packages. 
> They are normal extensions (from the point of view of the package 
> holding them) that become overrides only when you load that package in 
> an image already containing that method. At the same time, they allow 
> you to make your package self-sufficient. As you are developing it, no 
> matter what else you (or your users) have in the image, you define a 
> particular extension consistently with your package, as a "unit of 
> separately deployable code". Surely you don't want to depend on 
> another (version of  another) package that defines that extension. You 
> also can (should) not provide for other packages that may or may not 
> be loaded.

Yup, I agree that extensions are good. They allow us to do good OO 
design - putting methods on the classes where they belong - while still 
developing and maintaining a package as a single entity. I also 
acknowledge that if you allow extensions, you run the risk that two 
packages will define the same method. Ok, so the question is what do we 
do when that happens?

>  The most you can hope for is that right after loading your package, 
> it should work. If a subsequent load changes things, that's fine, but 
> that's as if the user manually modified one of your methods. You 
> cannot protect against that.

Not quite. A user manually modifying a method does so explicitly, and 
presumably with full knowledge of the implications of doing so - how 
the method works, what packages call it and what they expect from it. 
You can't protect against the programmer making a mistake when 
modifying a method, and shouldn't attempt to.

But when loading another package incidentally modifies a method also 
defined in another package, it happens without the user's knowledge. We 
therefore don't get the benefit of assuming that the user knows best. 
When this happens, we need to alert the user, as you mention below.

>  Making multiple independent packages work together is a much more 
> difficult problem. The only way that I know how to solve that is by 
> testing (and fixing), and I don't think that we should worry too much 
> about providing an automatic solution. My personal preference would be 
> to show in the Transcript that an override (of a method not in the 
> base image) has occured. It does not matter if they are stacked or 
> not, simply the fact that you overrode something from a different 
> package makes it probable that the overriden package does not work 
> anymore.

I think we can do better than log the override to the Transcript. (Ok, 
perhaps with ENVY or Store you can't do any better, but luckily we're 
writing our own versioning system!) Consider the situation assuming 
that both packages are maintained with MC2:

We have two versions of a method, both with complete version history. 
Because we have the version history, it doesn't really matter that the 
two versions come from different packages, it's exactly the same as 
merging two versions of the same package. So instead of one version 
overriding the other, we do a merge. By comparing the method histories 
we can decide if one version supersedes the other. That would mean that 
it's an updated version of the other, which means we can rely on the 
user's wisdom again. If the user changed the method from one of the 
versions we have to the other one, he must know what he's doing. 
Therefore we use which ever version the user has already chosen.

If neither version of the method supersedes the other, we have a 
conflict, and we ask the user to resolve it. In his infinite wisdom, 
he'll give us a new version of the method that will work for both 
packages. Or, if his wisdom is less than infinite, at least he knows 
about the conflict and can choose which package to break.

Once the merge is complete, the user has effectively reconciled the 
conflict between the packages, and the new method can be incorporated 
in to one or both of the packages. Thereafter, loading won't produce a 
conflict won't require the user's attention.

> <snip>
>
>>> All methods have to be versioned when the class
>>> (extension) is versioned, and this can be made automatically, just
>>> like classes have to be versioned when the package is versioned. In
>>> addition, the versions of methods that belong to a class version can
>>> be marked as special when browsing the method version history, just
>>> like class versions that belong to a package version can be marked
>>> specially. Sorry for the perhaps too low-level details, I just wanted
>>> to write things down. And Dan did ask us what we wanted to see in 
>>> such
>>> a system :)
>>
>>
>> I don't see why this is necessary. Is there some semantic effect 
>> you're after here, or do classes and packages just provide convenient 
>> ways to group program elements together for a snapshot?
>>
> This is probably just the memory of a frustration with Envy: because 
> it stores all these method editions (inluding every time you put a 
> "halt" in a method), the noise level is pretty high, so I always 
> wished that I could see at a glance, when looking at the list of 
> editions for a method, which editions are "real". But even if we have 
> explicit method versioning, so the noise is reduced, the most "real" 
> ones are the ones associated with the holder's version, because there 
> is an implicit minimal testing expectation for versions.For the method 
> version I would expect something like a unit-test, for a class, the 
> beginning of some functional testing. The expectation is even higher 
> for the package, because it usually groups together classes working in 
> tight coupling, so the testing done for a package version is more of a 
> functional test, so now those methods "really" work. I guess it would 
> be fun to disallow versioning if we detect that testing was not 
> performed :) Seriously though, it might be interesting if we could 
> link somehow versions to the tests performed.

Ok, I see. You just want to define a group of program elements that 
should be versioned together. With Monticello you do this explicitly, 
so there's a lot less noise. Everything is a "real" version, and they 
correspond to a bunch of other "real" versions that were current at the 
same time.

[snip]

>> Absolutely, although I think with very fine-grained version history, 
>> spurious conflicts aren't really a problem. I've never used Envy, but 
>> I understand that it doesn't do merges well. Florin, is that true in 
>> your experience? We put a lot of effort into Monticello's merge 
>> capabilities, and they're generally pretty good. MC2 should be an 
>> improvement.
>
> As Jon has already mentioned, if you follow their prescribed workflow, 
> with strict class ownership, the class (extension) owner is a 
> serialization point, only the owner can release it into the package, 
> therefore (s)he has to review it and merge it. Envy versions know 
> their ancestor, so you can easily tell what the other developer 
> changed compared to a previously relased version. But there was no 
> three-way merge browser, and on occasion it would have made life 
> easier.
> The problem is that in very small teams you don't need such a 
> workflow, everybody owns everything, and in very large teams, it does 
> not scale. At a previous employer, I was part of such a very large 
> team, with multiple locations, across the Atlantic. We had some tools 
> to support a changeset model (Envy does not have one), but the 
> changesets were stored as blobs in Envy, so they were unbrowsable, and 
> they were mutable, so you could not tell what ended up being released, 
> either in the development image or even in production, because 
> developers would continue to work in the same changeset. Because they 
> were mutable, they also didn't have any meaningful timestamp. I worked 
> on an improved generation of tools, and I made them browsable and 
> versionable. Obviously, I was also using them, and I found them very 
> convenient. The one thing that I did not get to do was to make them 
> method-level, their granularity was at the class (extension) version 
> level, and it was this aspect that made conflicts more frequent than 
> necessary: when you version one changeset, you have to version its 
> contents as well, so you would version the (whole) class (extension); 
> if another developer is working on the same class extension, (s)he 
> would also have to version it, then when trying to release both 
> changesets, they would appear to clash even though they may have fixed 
> different bugs in different methods. Mind you, this is not as terrible 
> as it sounds, because it would mean an additional pair of eyes would 
> have to check things, and even if methods don't conflict directly they 
> may conflict through side-effects. I for one am a bit queasy about 
> automatic merging. Even when entire classes don't seem to clash they 
> can affect each other (obviously if one is a superclass of a class 
> from the other side of the merge)
> To support our workflow and the class ownership model, I also added 
> state like "submitted for approval", "rejected", "approved"... with 
> notifications through Envy and email that you needed to do something 
> about it. I had not mentioned this part, because I don't think such a 
> workflow would work in a loosely-coupled community like Squeak, 
> although maybe some form of ownership might not be a bad idea.

With Monticello we've tried to support the 
open-source-distributed-development workflow as much as possible. This 
means lots of optimistic concurrent development, and no reliance on 
central repositories or sources of authority. This means lots of 
automatic merges, partial merges, and repeated merges, because 
branching happens a lot.

Perhaps surprisingly, we've found that merges aren't that hard to do 
well. The key is to have contextual information for the different 
versions we're dealing with. There are two dimensions to context - 
temporal and spacial. The temporal context of an of a method is its 
ancestry, the series of other versions of the method that had been 
modified to produce it. We can consider those versions to be superseded 
by the current one - evidently somebody had a reason to change the 
method and the sum of those reasons has resulted in this version.

The spacial context captures the method's relationship with other 
elements of the code. When a developer snapshots an entire package, 
he's essentially synchronizing the ancestries of all the elements in 
it. We then know that all those synchronized versions "belong together" 
in some way, even if that version of the package doesn't work.

If we have those two dimensions of context, we can do merges correctly: 
automatically applying changes that don't conflict, and correctly 
detecting genuine conflicts and presenting them to the user for 
resolution.

Colin