Modules

Sun Feb 27 04:20:27 UTC 2005

Colin Putney wrote:

> Hi folks,
>
> I'm moving this thread over to the Monticello mailing list. No need to 
> clutter squeak-dev with versioning discussions. If you want to sign up 
> the list is here:
>
> http://mail.wiresong.ca/mailman/listinfo/monticello

Thanks for the link, I have just subscribed, and I am cc-ing it, but, as 
one of the things that we seem to discuss is wether to have overrides or 
not in our modules, I think it is still relevant to the general modules 
discussion.

<snip>

>>  Making multiple independent packages work together is a much more 
>> difficult problem. The only way that I know how to solve that is by 
>> testing (and fixing), and I don't think that we should worry too much 
>> about providing an automatic solution. My personal preference would 
>> be to show in the Transcript that an override (of a method not in the 
>> base image) has occured. It does not matter if they are stacked or 
>> not, simply the fact that you overrode something from a different 
>> package makes it probable that the overriden package does not work 
>> anymore.
>
>
> I think we can do better than log the override to the Transcript. (Ok, 
> perhaps with ENVY or Store you can't do any better, but luckily we're 
> writing our own versioning system!) Consider the situation assuming 
> that both packages are maintained with MC2:
>
> We have two versions of a method, both with complete version history. 
> Because we have the version history, it doesn't really matter that the 
> two versions come from different packages, it's exactly the same as 
> merging two versions of the same package. So instead of one version 
> overriding the other, we do a merge. By comparing the method histories 
> we can decide if one version supersedes the other. That would mean 
> that it's an updated version of the other, which means we can rely on 
> the user's wisdom again. If the user changed the method from one of 
> the versions we have to the other one, he must know what he's doing. 
> Therefore we use which ever version the user has already chosen.

I am sorry, but this is simply not true. A developer may choose, in a 
newer version of a class, to ignore some unrelated development, and 
stick to an older protocol, by including some older versions for some of 
the methods. This is not a made up example, I have encountered the 
situation quite often. You can easily have, as a simplistic example, 
PackageA>ClassB>methodC(version1),methodD(version2) and 
PackageE>ClassB>methodC(version2),methodD(version1). The automatic 
resolution will do the wrong thing, and it won't even inform the user.

Making independently developed packages work together means 
(intelligent) work, and if there's any overlap, the chances of solving 
the issues automatically are, I believe, very slim, and versioning does 
not help. Even if all the common methods in one of the packages are 
newer versions (and descendants) of the same methods in the other 
packages, it still doesn't mean that they are made to work with the 
older package, it may simply mean that the newer package is supposed to 
work with a newer version of the older package. I think the only 
situation where you can say that there is no conflict is when the common 
methods are all identical, and for this you don't need versions. This is 
why, to my mind, overrides have nothing to do with versioning, they are 
simply a different kind of extension.

> If neither version of the method supersedes the other, we have a 
> conflict, and we ask the user to resolve it. In his infinite wisdom, 
> he'll give us a new version of the method that will work for both 
> packages. Or, if his wisdom is less than infinite, at least he knows 
> about the conflict and can choose which package to break.
>
> Once the merge is complete, the user has effectively reconciled the 
> conflict between the packages, and the new method can be incorporated 
> in to one or both of the packages. Thereafter, loading won't produce a 
> conflict won't require the user's attention.
>
>
>> <snip>
>>
>>>> All methods have to be versioned when the class
>>>> (extension) is versioned, and this can be made automatically, just
>>>> like classes have to be versioned when the package is versioned. In
>>>> addition, the versions of methods that belong to a class version can
>>>> be marked as special when browsing the method version history, just
>>>> like class versions that belong to a package version can be marked
>>>> specially. Sorry for the perhaps too low-level details, I just wanted
>>>> to write things down. And Dan did ask us what we wanted to see in such
>>>> a system :)
>>>
>>>
>>>
>>> I don't see why this is necessary. Is there some semantic effect 
>>> you're after here, or do classes and packages just provide 
>>> convenient ways to group program elements together for a snapshot?
>>>
>> This is probably just the memory of a frustration with Envy: because 
>> it stores all these method editions (inluding every time you put a 
>> "halt" in a method), the noise level is pretty high, so I always 
>> wished that I could see at a glance, when looking at the list of 
>> editions for a method, which editions are "real". But even if we have 
>> explicit method versioning, so the noise is reduced, the most "real" 
>> ones are the ones associated with the holder's version, because there 
>> is an implicit minimal testing expectation for versions.For the 
>> method version I would expect something like a unit-test, for a 
>> class, the beginning of some functional testing. The expectation is 
>> even higher for the package, because it usually groups together 
>> classes working in tight coupling, so the testing done for a package 
>> version is more of a functional test, so now those methods "really" 
>> work. I guess it would be fun to disallow versioning if we detect 
>> that testing was not performed :) Seriously though, it might be 
>> interesting if we could link somehow versions to the tests performed.
>
>
> Ok, I see. You just want to define a group of program elements that 
> should be versioned together. With Monticello you do this explicitly, 
> so there's a lot less noise. Everything is a "real" version, and they 
> correspond to a bunch of other "real" versions that were current at 
> the same time.

Even if you do it explicitely, not all versioning happens at the same time.
I develop a method, it looks good, I test it a little (workspace, 
unit-test, whatever), I am happy with it and I want to keep it. I 
version it (separately, because this is what method-level granularity 
means).
I work some more on the class, I refactor the code a little, break it up 
in multiple methods, I test it, I am happy with it, I version the class.
I work some more on similar classes, collaborating classes, refactor, 
test, I am happy, I version the package.
The method may have gone through several iterations (all versions), that 
are not noise, I have explicitely created all the versions, but they 
represent different stages in the evolution, different testing levels, 
and different confidence levels. If there is only one version of the 
method, because I versioned everything (all the containers) at once, 
then yes, they all go together. If not, the versions of this method that 
correspond to (are included in) class versions are slightly "better", 
the versions of the method that correspond with the class versions that 
are included in the package versions are even more so. And the version 
that is part of the production image is simply great :)

>
> [snip]
>
>>> Absolutely, although I think with very fine-grained version history, 
>>> spurious conflicts aren't really a problem. I've never used Envy, 
>>> but I understand that it doesn't do merges well. Florin, is that 
>>> true in your experience? We put a lot of effort into Monticello's 
>>> merge capabilities, and they're generally pretty good. MC2 should be 
>>> an improvement.
>>
>>
>> As Jon has already mentioned, if you follow their prescribed 
>> workflow, with strict class ownership, the class (extension) owner is 
>> a serialization point, only the owner can release it into the 
>> package, therefore (s)he has to review it and merge it. Envy versions 
>> know their ancestor, so you can easily tell what the other developer 
>> changed compared to a previously relased version. But there was no 
>> three-way merge browser, and on occasion it would have made life easier.
>> The problem is that in very small teams you don't need such a 
>> workflow, everybody owns everything, and in very large teams, it does 
>> not scale. At a previous employer, I was part of such a very large 
>> team, with multiple locations, across the Atlantic. We had some tools 
>> to support a changeset model (Envy does not have one), but the 
>> changesets were stored as blobs in Envy, so they were unbrowsable, 
>> and they were mutable, so you could not tell what ended up being 
>> released, either in the development image or even in production, 
>> because developers would continue to work in the same changeset. 
>> Because they were mutable, they also didn't have any meaningful 
>> timestamp. I worked on an improved generation of tools, and I made 
>> them browsable and versionable. Obviously, I was also using them, and 
>> I found them very convenient. The one thing that I did not get to do 
>> was to make them method-level, their granularity was at the class 
>> (extension) version level, and it was this aspect that made conflicts 
>> more frequent than necessary: when you version one changeset, you 
>> have to version its contents as well, so you would version the 
>> (whole) class (extension); if another developer is working on the 
>> same class extension, (s)he would also have to version it, then when 
>> trying to release both changesets, they would appear to clash even 
>> though they may have fixed different bugs in different methods. Mind 
>> you, this is not as terrible as it sounds, because it would mean an 
>> additional pair of eyes would have to check things, and even if 
>> methods don't conflict directly they may conflict through 
>> side-effects. I for one am a bit queasy about automatic merging. Even 
>> when entire classes don't seem to clash they can affect each other 
>> (obviously if one is a superclass of a class from the other side of 
>> the merge)
>> To support our workflow and the class ownership model, I also added 
>> state like "submitted for approval", "rejected", "approved"... with 
>> notifications through Envy and email that you needed to do something 
>> about it. I had not mentioned this part, because I don't think such a 
>> workflow would work in a loosely-coupled community like Squeak, 
>> although maybe some form of ownership might not be a bad idea.
>
>
> With Monticello we've tried to support the 
> open-source-distributed-development workflow as much as possible. This 
> means lots of optimistic concurrent development, and no reliance on 
> central repositories or sources of authority. This means lots of 
> automatic merges, partial merges, and repeated merges, because 
> branching happens a lot.
>
> Perhaps surprisingly, we've found that merges aren't that hard to do 
> well. The key is to have contextual information for the different 
> versions we're dealing with. There are two dimensions to context - 
> temporal and spacial. The temporal context of an of a method is its 
> ancestry, the series of other versions of the method that had been 
> modified to produce it. We can consider those versions to be 
> superseded by the current one - evidently somebody had a reason to 
> change the method and the sum of those reasons has resulted in this 
> version.

I have done this as well in the tools that I've developed in Envy 
(relying on ancestry to determine "real" conflicts). It does work 
probably for a majority of the situations, but when it fails it 
introduces subtle and hard to find bugs.

> The spacial context captures the method's relationship with other 
> elements of the code. When a developer snapshots an entire package, 
> he's essentially synchronizing the ancestries of all the elements in 
> it. We then know that all those synchronized versions "belong 
> together" in some way, even if that version of the package doesn't work.
>
> If we have those two dimensions of context, we can do merges 
> correctly: automatically applying changes that don't conflict, and 
> correctly detecting genuine conflicts and presenting them to the user 
> for resolution.
>

Hopefully, most of the time. :)

> Colin
>
>
>