Hi, Dan,
I would love to be part of your team. And I am interested in working on all the (not for long) future technologies.
About modules and packages and other related code organization structures:
I think I should start by mentioning the extent and limitations of what I consider my relevant (to this subject) experience: I have had extensive, as well as low-level experience with Envy, I have also built additional organizational structures and workflow tools on top of it, I have ported code from Envy to Store and I have worked with Store. I have also worked with several Java IDEs and SCMs. I have also worked on a creating a minimal VW image, an experiment similar to Craig's Spoon.
And now, my view of the world, in (not so) short:
What I like about VW parcels/packages: they encapsulate in a robust (from a loading perspective) form an independent piece of functionality. Very Smalltakish in the sense that they allow partial loading: if it contains methods of a class that is missing, no problem, the parcel holds unto the uninstalled methods, and when/if the missing class is loaded, it installs the methods. This could be extended even to missing superclasses, which would make them practically load-order independent. They also have the notion of (stackable) overrides, so if in your package you change a method from a different package, you can easily browse both the override and the overriden code, but most importantly you can safely unload your package and things are restored properly. One thing that you don't have with load-order independece is dependency information. Your parcel/package may load but you have no clue if it will run. Of course, one could manually check Undeclared, look for unimplemented but sent selectors, etc, but I think we could offer more tools support for dependecy management. Envy does not allow out-of order loads, and the applications' (Envy's packages) prerequisites information is enforced only for superclass-subclasses and class-extensions relationships (it does not allow you to subclass or extend in an application where to the original class is not visible (through an explicit dependency declaration)). IMHO dependency information is useful, but it should not stop your code-writing workflow, nor should it stop you from loading partially. A potential solution (for also having dependency information) would be to compute (as extensively as reasonably possible) and store the dependecies at freezing/versioning time. Since this is a best effort solution, I don't think this should require that packages that you are dependent on to be also frozen/versioned. Dependency information could be stored as "version x" if the dependency is a version, or "version x+" if it's been modified since the last time it was versioned (as x). The base image is obviously versioned as well. It is perhaps obvious by now that I consider versioning an important feature of any organizational structure. A version is a shareable, immutable snapshot, at a finer granularity than the whole image. For VW the granularity is the package, for Envy it is the class. If one ignores the explicit and named aspects of versioning, Envy's versioning granularity is actually at the method level. Each Envy method "edition" is an immutable, timestamped snapshot of the method. They are created automatically at each "accept", which is what makes them impractical for remote servers. This is the main argument in Store for their much coarser granularity (less chat), but I think this is the wrong approach: latency can be addressed for example with background processing, and frequency can be decreased by making method versioning a separate, explicit user action. I think that in general, the finer granularity the better. If one recognizes that not every accept deserves a method version (obviously one could not have tested a method at accept time) and each change is logged locally anyway, we can grant methods their own, named versions, which may be explicitely pushed to the respository, and which can later come very handy when browsing the history. All methods have to be versioned when the class (extension) is versioned, and this can be made automatically, just like classes have to be versioned when the package is versioned. In addition, the versions of methods that belong to a class version can be marked as special when browsing the method version history, just like class versions that belong to a package version can be marked specially. Sorry for the perhaps too low-level details, I just wanted to write things down. And Dan did ask us what we wanted to see in such a system :) Now, method versions are not interesting just for themselves. A different kind of code organization is a patch, or a unit of work which happens after the packaging structure has been defined, and is perhaps cross-cutting through many different packages. This is a changeset in a non-packaged, non-versioned, and limited-collaboration universe. In a versioned, packaged world, the changesets themselves should be versioned entities, and be composed of versioned sub-entities. It is especially for changesets that method-level versioning comes in handy, because here the finer, method-level granularity is needed. If you are forced to create new (entire) class versions for inclusion in a versioned patch, this not only adds noise, but it creates a much higher incidence of merging conflicts.
As far as namespaces go, the problem to be solved seems much easier, and I think everybody agrees that a heavy-weight solution like in VW is inappropriate. I profoundly disliked in VW the fact that they namespaced the base image in a lot of small, meaningless namespaces (although there was no name conflict to be solved), just as a display of what could be done with them. I disliked the fact that namespaces were made into as first-class objects as classes, to my mind without the same conceptual justification. I also profoundly disliked that they now had both namespaces (for the rest of us) and namescopes (for the compiler) as two very similar (and with very similar responsibilities) class hierachies, but yet distinct. I think that the name lookup rules for the compiler should be the same as the ones for our code, and I think that the base image should contain no namespace other than Smalltalk. There is indeed potential for name conflicts when independently developed packages are put together in the same image. But if we are only trying to solve this issue, and we don't mix it with categorizations (which should be done by packges), I would think that a simple rule like "all external applications' classes and packages should live in their own (only one) external namespace" should be sufficient. This could easily happen automatically, with a prompt for an image-wide development namespace at the first class creation (like the initials prompt for the changeset). Each corporation would have their own namespace and they would do all of their development in it. And a few priviledged among us like Dan (and maintainers of what is accepted as part of the base) would always just type "Smalltalk", and that would be it :)
About image construction versus image stripping, I think we should be able to do both. The image is a very powerful instrument for development and I want to keep using it. By using it I will necessarily dirty it, but I don't care as long as it is still a work in progress. When I think I am done, I want to be able to extract the application from the image, and I want tools to help me with that. Once extracted, applications can be used to build clean images. Please note that, while packages help with keeping things organized, they are not a solution for code rot, and a long-lived application will be touched by many hands, not all of them informed or competent or careful. There will always be a case for stripping. There is a case and a market for application extraction in Java, which does not have images at all, so stripping is not so much about the image itself. Building from scratch is useful, but it only eliminates unpackaged things, it does not deal with packaged dead code
Cheers,
Florin
Florin Mateoc florin.mateoc@gmail.com wrote...
I would love to be part of your team. And I am interested in working on all the (not for long) future technologies.
And we would love to have you. Welcome aboard!
About modules and packages and other related code organization structures: ... And now, my view of the world, in (not so) short: ...
Thank you for this great exposition. I am still digesting it. Your experience will be valuable as we map out the project and, later, put it to the test.
- Dan
Hi folks,
Here's that separate post on the Monticello-redesign I promised. Couldn't manage it yesterday. Dan, once it's sunk in a bit, I'll rework this into a summary for the modules team. (Note: MC stands for Monticello. MC1 is the version currently on SqueakMap, MC2 is the experimental version, which I expect to replace MC1 eventually).
As before, one of the key features of Monticello is that it captures enough development history to allow divergent branches of development to be safely and easily merged. The new thing in MC2 is that it keeps separate version history for each program element (classes, methods, instance variables etc) rather than for entire packages.
This gives us a lot more flexibility about how to group elements for versioning. Where MC1 is tightly bound to packages with sharp boundaries between them, MC2 is happy to work with just about any group of elements a developer decides he's interested in. I've been using the term "tag" for this - conceptually, program elements such as classes, methods, instance variables, globals and so on are annotated with tags, and when you take a snapshot of the tag, all the elements with the tag are included in it. This will allow us to do some handy stuff:
- Package-oriented versioning, similar to Store or MC1. This works quite nicely for well-contained applications.
- Task-oriented versioning, similar to ChangeSets, but versionable and mergeable. I could post a change set to the list, others could take it in different directions and I'd be able to safely merge the results back into a single change set.
- Robust update streams. We could automatically detect conflicts between updates in the stream or between an update and local changes, and easily resolve them.
- Maintain the kernel. It should be possible to do "tricky" things in MC2, like changing the Compiler, or refactoring Association. They'll still be tricky, but at least we won't have to resort to hand-edited fileIns.
Avi and I have been working on this on and off for a couple of months now. Although we've got the basic versioning engine in place, we're still a ways from having a useable app. I'll be demoing it at Smalltalk Solutions, so I'm committed to getting it working nicely by late June.
I'll respond to (an abridged version of) Florin's introduction, since he brings up a lot of important issues.
On Feb 23, 2005, at 1:47 AM, Florin Mateoc wrote:
What I like about VW parcels/packages: they encapsulate in a robust (from a loading perspective) form an independent piece of functionality. Very Smalltakish in the sense that they allow partial loading: if it contains methods of a class that is missing, no problem, the parcel holds unto the uninstalled methods, and when/if the missing class is loaded, it installs the methods. This could be extended even to missing superclasses, which would make them practically load-order independent.
Yes, I like this too. MC2 takes the same approach (including handling superclasses).
They also have the notion of (stackable) overrides, so if in your package you change a method from a different package, you can easily browse both the override and the overriden code, but most importantly you can safely unload your package and things are restored properly.
We've supported overrides in MC1 for a while now, and AFAICT, they're more trouble than they're worth. That may be partly because PackageInfo makes it ugly to implement, but I think there are semantic issues as well.
Overrides imply a fixed load (and unload) order, and more subtly, a version-specific dependency. The overridden method has to keep working from the other package's point of view, and that gets really difficult when we've got a stack of overrides. In that case, we've got the expectations of 3 or more packages to satisfy with one method. When we violate package encapsulation that way, we create a really tight version dependency between the packages, which suggests that maybe they shouldn't be separate packages at all. They can't be developed and deployed separately.
I don't know what's the best way to handle it, but I'm inclined towards just considering it a versioning problem. If the same method is defined in two packages, we've got two implementations to reconcile, right? With overrides, you resolved it according to the order that the packages were loaded. If we've got the versioning history as in MC2, we can use that information to make a better decision. If one implementation supersedes the other, use that one. If not, you've got a conflict and you let the user resolve it. Instead of choosing the implementation loaded most recently, we choose the one that was written most recently.
One thing that you don't have with load-order independece is dependency information. Your parcel/package may load but you have no clue if it will run. Of course, one could manually check Undeclared, look for unimplemented but sent selectors, etc, but I think we could offer more tools support for dependecy management. Envy does not allow out-of order loads, and the applications' (Envy's packages) prerequisites information is enforced only for superclass-subclasses and class-extensions relationships (it does not allow you to subclass or extend in an application where to the original class is not visible (through an explicit dependency declaration)). IMHO dependency information is useful, but it should not stop your code-writing workflow, nor should it stop you from loading partially. A potential solution (for also having dependency information) would be to compute (as extensively as reasonably possible) and store the dependecies at freezing/versioning time. Since this is a best effort solution, I don't think this should require that packages that you are dependent on to be also frozen/versioned. Dependency information could be stored as "version x" if the dependency is a version, or "version x+" if it's been modified since the last time it was versioned (as x). The base image is obviously versioned as well.
Agreed. It's important to make the distinction between syntactic and semantic dependencies. Envy takes the hard-line so as to guarantee both semantic and syntactic compatibility: if package A was developed against package B, we know that keeping their versions synchronized will ensure that they will work as well together in deployment just as well as they did in development. But if we settle for only syntactic compatibility, we can get a lot more of the traditional Smalltalk best-effort-what-we've-got approach to giving the user/developer control.
I'd like to see a package system that can, as Florin suggests above, detect syntactic dependencies between packages but doesn't try to guarantee semantic compatibility. That's a job for SqueakMap and Package Universes. We might even want to regard the recorded dependencies as hints about how to resolve syntactic dependencies at load time. For example, say we're loading a method that has a reference to Foo. If there happens to be a class called Foo, great. If there isn't, we look at the package dependencies to figure out where to go looking for Foo.
It is perhaps obvious by now that I consider versioning an important feature of any organizational structure. A version is a shareable, immutable snapshot, at a finer granularity than the whole image. For VW the granularity is the package, for Envy it is the class. If one ignores the explicit and named aspects of versioning, Envy's versioning granularity is actually at the method level. Each Envy method "edition" is an immutable, timestamped snapshot of the method. They are created automatically at each "accept", which is what makes them impractical for remote servers. This is the main argument in Store for their much coarser granularity (less chat), but I think this is the wrong approach: latency can be addressed for example with background processing, and frequency can be decreased by making method versioning a separate, explicit user action.
Yes. This is the conclusion that Avi and I came to with Monticello. Fine-grained versioning is enormously powerful, and even Envy doesn't take much advantage of it. By requiring an explicit step to save a new version, supporting many repository types and making it easy to move versions between repositories, I think we can make fine-grained versioning work quite well.
All methods have to be versioned when the class (extension) is versioned, and this can be made automatically, just like classes have to be versioned when the package is versioned. In addition, the versions of methods that belong to a class version can be marked as special when browsing the method version history, just like class versions that belong to a package version can be marked specially. Sorry for the perhaps too low-level details, I just wanted to write things down. And Dan did ask us what we wanted to see in such a system :)
I don't see why this is necessary. Is there some semantic effect you're after here, or do classes and packages just provide convenient ways to group program elements together for a snapshot?
Now, method versions are not interesting just for themselves. A different kind of code organization is a patch, or a unit of work which happens after the packaging structure has been defined, and is perhaps cross-cutting through many different packages. This is a changeset in a non-packaged, non-versioned, and limited-collaboration universe. In a versioned, packaged world, the changesets themselves should be versioned entities, and be composed of versioned sub-entities. It is especially for changesets that method-level versioning comes in handy, because here the finer, method-level granularity is needed. If you are forced to create new (entire) class versions for inclusion in a versioned patch, this not only adds noise, but it creates a much higher incidence of merging conflicts.
Absolutely, although I think with very fine-grained version history, spurious conflicts aren't really a problem. I've never used Envy, but I understand that it doesn't do merges well. Florin, is that true in your experience? We put a lot of effort into Monticello's merge capabilities, and they're generally pretty good. MC2 should be an improvement.
As far as namespaces go, the problem to be solved seems much easier, and I think everybody agrees that a heavy-weight solution like in VW is inappropriate. I profoundly disliked in VW the fact that they namespaced the base image in a lot of small, meaningless namespaces (although there was no name conflict to be solved), just as a display of what could be done with them. I disliked the fact that namespaces were made into as first-class objects as classes, to my mind without the same conceptual justification. I also profoundly disliked that they now had both namespaces (for the rest of us) and namescopes (for the compiler) as two very similar (and with very similar responsibilities) class hierachies, but yet distinct. I think that the name lookup rules for the compiler should be the same as the ones for our code, and I think that the base image should contain no namespace other than Smalltalk. There is indeed potential for name conflicts when independently developed packages are put together in the same image. But if we are only trying to solve this issue, and we don't mix it with categorizations (which should be done by packges), I would think that a simple rule like "all external applications' classes and packages should live in their own (only one) external namespace" should be sufficient. This could easily happen automatically, with a prompt for an image-wide development namespace at the first class creation (like the initials prompt for the changeset). Each corporation would have their own namespace and they would do all of their development in it. And a few priviledged among us like Dan (and maintainers of what is accepted as part of the base) would always just type "Smalltalk", and that would be it :)
Agreed. I've already mentioned that I'd like to see a decoupling between the structure of the program elements in the image and the compiler's binding of names to objects. The Forth strategy sounds right to me.
Thanks for the insight Florin,
Colin
hi colin
As far as namespaces go, the problem to be solved seems much easier, and I think everybody agrees that a heavy-weight solution like in VW is inappropriate. I profoundly disliked in VW the fact that they namespaced the base image in a lot of small, meaningless namespaces (although there was no name conflict to be solved), just as a display of what could be done with them. I disliked the fact that namespaces were made into as first-class objects as classes, to my mind without the same conceptual justification. I also profoundly disliked that they now had both namespaces (for the rest of us) and namescopes (for the compiler) as two very similar (and with very similar responsibilities) class hierachies, but yet distinct. I think that the name lookup rules for the compiler should be the same as the ones for our code, and I think that the base image should contain no namespace other than Smalltalk. There is indeed potential for name conflicts when independently developed packages are put together in the same image. But if we are only trying to solve this issue, and we don't mix it with categorizations (which should be done by packges), I would think that a simple rule like "all external applications' classes and packages should live in their own (only one) external namespace" should be sufficient. This could easily happen automatically, with a prompt for an image-wide development namespace at the first class creation (like the initials prompt for the changeset). Each corporation would have their own namespace and they would do all of their development in it. And a few priviledged among us like Dan (and maintainers of what is accepted as part of the base) would always just type "Smalltalk", and that would be it :)
Agreed. I've already mentioned that I'd like to see a decoupling between the structure of the program elements in the image and the compiler's binding of names to objects. The Forth strategy sounds right to me.
I'm not that sure in fact. If you look in VW the worse to me is class level namespace import.
But now after the experience with classboxes (basically a classbox is a changeset that lives inside a namespace and whose extensions are scoped: ie only visible for the guy inside the classbox) I'm not sure that namespace are worth. Finally what I like with classbox namespace part (not the fact that class extension are scoped (which I like but is another point), is that for a developer when I'm coding my method I do not think that there are namespaces around. When a name is not found, the borwser pop up and asks me, I found two Array classes in the system, then I pick one and there is an import statement added in the namespace and this is it. After I can see array as normal. The first paper on classbox is http://www.iam.unibe.ch/~scg/Archive/Papers/Berg03aClassboxes.pdf But again I'm not sure that we need them.
Because how forth is different that having classpath problem after? Stef
On Feb 25, 2005, at 4:03 AM, stéphane ducasse wrote:
Agreed. I've already mentioned that I'd like to see a decoupling between the structure of the program elements in the image and the compiler's binding of names to objects. The Forth strategy sounds right to me.
I'm not that sure in fact. If you look in VW the worse to me is class level namespace import.
I agree that class-level namespace import is a pain. But I think to focus on that is to miss the forest for the trees. The problem is not that the imports are on classes rather than modules, but that they are properties of the code rather than its environment. (And I don't mean the environments we have in Squeak already, I mean the context in which the code finds its self when it is loaded into an image.) Let me put it another way: An import is fundamentally a relationship between the module and something outside its self. If you define that relationship within the module, be it per-module, per-class, or per-method (eg. with fully-qualified names), you're hardcoding within the module the structure that its environment must have when it is loaded.
I'd like to see a system where the relationship a module has to its environment is defined when it's loaded into that environment, by whomever is doing the loading - ultimately, the user. When loading a module, we'd provide the loader with a context, which would supply a way of resolving names (#bindingOf:, basically), and a way to bind new objects into the namespace being constructed (#at:put:). The Compiler and ClassBuilder would use the context to link up the new module with it's environment.
Now, that loading context could encapsulate whatever kind of policy you want. We could have an EnvironmentContext that provides exactly the same behaviour we have in Squeak today. You could have another context that held a Forth-style (I guess) list of namespaces with a search order. You could have a SandboxContext that made it impossible to reference dangerous code, or an Atomic context that used a sandbox temporarily, and then linked the new module into the image atomically. You could even have a ClasspathContext if you were into that sort of thing.
Finally what I like with classbox namespace part (not the fact that class extension are scoped (which I like but is another point), is that for a developer when I'm coding my method I do not think that there are namespaces around. When a name is not found, the borwser pop up and asks me, I found two Array classes in the system, then I pick one and there is an import statement added in the namespace and this is it. After I can see array as normal. The first paper on classbox is http://www.iam.unibe.ch/~scg/Archive/Papers/Berg03aClassboxes.pdf But again I'm not sure that we need them.
Yup, I like that experience too. I let's do that.
Colin
On 26 févr. 05, at 5:07, Colin Putney wrote:
On Feb 25, 2005, at 4:03 AM, stéphane ducasse wrote:
Agreed. I've already mentioned that I'd like to see a decoupling between the structure of the program elements in the image and the compiler's binding of names to objects. The Forth strategy sounds right to me.
I'm not that sure in fact. If you look in VW the worse to me is class level namespace import.
I agree that class-level namespace import is a pain. But I think to focus on that is to miss the forest for the trees. The problem is not that the imports are on classes rather than modules, but that they are properties of the code rather than its environment. (And I don't mean the environments we have in Squeak already, I mean the context in which the code finds its self when it is loaded into an image.) Let me put it another way: An import is fundamentally a relationship between the module and something outside its self. If you define that relationship within the module, be it per-module, per-class, or per-method (eg. with fully-qualified names), you're hardcoding within the module the structure that its environment must have when it is loaded.
I'd like to see a system where the relationship a module has to its environment is defined when it's loaded into that environment, by whomever is doing the loading - ultimately, the user. When loading a module, we'd provide the loader with a context, which would supply a way of resolving names (#bindingOf:, basically), and a way to bind new objects into the namespace being constructed (#at:put:). The Compiler and ClassBuilder would use the context to link up the new module with it's environment.
I understand and this is not incompatible with import and the fact that as a programmer we do not have to deal with multiple namespaces awareness inside the method body.
Now, that loading context could encapsulate whatever kind of policy you want. We could have an EnvironmentContext that provides exactly the same behaviour we have in Squeak today. You could have another context that held a Forth-style (I guess) list of namespaces with a search order. You could have a SandboxContext that made it impossible to reference dangerous code, or an Atomic context that used a sandbox temporarily, and then linked the new module into the image atomically. You could even have a ClasspathContext if you were into that sort of thing.
Finally what I like with classbox namespace part (not the fact that class extension are scoped (which I like but is another point), is that for a developer when I'm coding my method I do not think that there are namespaces around. When a name is not found, the borwser pop up and asks me, I found two Array classes in the system, then I pick one and there is an import statement added in the namespace and this is it. After I can see array as normal. The first paper on classbox is http://www.iam.unibe.ch/~scg/Archive/Papers/Berg03aClassboxes.pdf But again I'm not sure that we need them.
Yup, I like that experience too. I let's do that.
Colin
I'd like to see a system where the relationship a module has to its environment is defined when it's loaded into that environment, by whomever is doing the loading - ultimately, the user.
Exactly. This is also the assertion made by Findler and Flatt (http://www.iam.unibe.ch/~scg/cgi-bin/oobib.cgi?query=Find98a) in they paper "Modular object-oriented programming with units and mixins". In they model of Units (kind of module), the import is stated __externally__ to the definition of the unit itself. A Unit just say "I need a class named Socket", rather than saying "I use the class Socket provided by the unit Network".
Concretely, in squeak, this can be done using a king of configuration that gather all the relationship between various packages.
Cheers, Alexandre
On Fri, 25 Feb 2005 01:27:55 -0500, Colin Putney cputney@wiresong.ca wrote:
Absolutely, although I think with very fine-grained version history, spurious conflicts aren't really a problem. I've never used Envy, but I understand that it doesn't do merges well. Florin, is that true in your experience? We put a lot of effort into Monticello's merge capabilities, and they're generally pretty good. MC2 should be an improvement.
Envy doesn't handle merging very well. It is good at detecting conflicts, but you have to go and make new versions of classes, load the approriate method editions, and then re-version to accomplish a merge. The Store merge tool is enormously more useful and faster to use. I haven't played with the Monticello merge tool yet, as my Squeak projects tend to be single-developer efforts.
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Seeker (Micro Autonomous Underwater Vehicle) http://www.huv.com
The module system should ideally be built with some thought of how the external packages will be maintained. We surely want to have stable packages around, but there is no such thing as a "stable package" sitting by itself. A package can only be stable with respect to a set of other packages that it might be loaded along with. Thus, we will want to have repositories, and repositories will work best if the package format has predicted their needs.
One particular issue to think about is how dependencies work. They should:
1. Exist. Without dependencies, loading packages from a repository requires requires a lot of manual work.
2. Refer to a large enough number of packages, that new package versions can be posted without requiring a cascade of other updates to be posted. Without this property, images can get stuck in a gridlock, unable to upgrade anything.
3. Refer to a specific enough set of packages, that any one of them is likely to work. That is, the dependencies should be tight enough to be meaningful.
4. Not require loading too many entire packages, in order to figure out the dependencies. For example, automatic dependencies based on loading the packages to see that classes they refer to, make it tricky for a repository browser to know what the dependencies are -- because it hasn't loaded the packages yet.
#3 does not seem to be an issue in practice, despite a lot of talk it gets. It is easy to get some tolerable level of reliability (e.g., you don't load a WWW browser without an HTML parser), and it similarly seems impractical to get really solid reliances (this HTML parser will definitely work). The achievable area in the middle is large and almost impossible to miss.
The other items, to contrast, take some care.
As an existence proof, Debian has shown that simple name-based dependencies are enough to meet all four requirements. So all we really need are dependencies like "depends on some package named Network-HTML". If someone has a better idea, that is fine too, but note that there is at least one solution here.
Finally, the related works section should include Debian's module system. The mechanics of loading and unloading such packages are completely different from what we need, but some of their extra information should map over. In particular, they have worked out: dependencies, install/uninstall code, and configuration. Additionally, they have an organization similar to our own, with hundreds of independent developers scattered around the world.
-Lex
On Feb 25, 2005, at 7:20 PM, Lex Spoon wrote:
The module system should ideally be built with some thought of how the external packages will be maintained. We surely want to have stable packages around, but there is no such thing as a "stable package" sitting by itself. A package can only be stable with respect to a set of other packages that it might be loaded along with. Thus, we will want to have repositories, and repositories will work best if the package format has predicted their needs.
One particular issue to think about is how dependencies work.
I'm replying to Lex here, but only because he touches on an issue that I've seen come up in many messages on this thread. The above post, and Lex's ensuing discussion of dependencies is important, I think, but not part of the core issue of modularization of the image. The same can be said of my own area of particular interest - versioning. As I see it, there are three topics here, which are all related, but largely separate and should be addressed by separate mechanisms:
1. Modules. This is about the organization of executable code in the image into modules, loading and unloading modules, isolating modules from each other, allowing them to communicate, and reflecting upon modifying modules, their organization and their structure. This is where the E/Islands/Namespaces/security stuff its in as well. We've been using PackageInfo as a very, very simple stand-in for a real modules system.
2. Versioning. However we decided to slice up the image, we need to be able to keep track of how the different modules change over time, produce and reconcile different versions of modules, and share them between developers distributed over time and space. In the early days we used ChangeSets and the update stream for this, but since the introduction of PackageInfo for modularization, we've been able to use Monticello for versioning.
3. Organization and Distribution. Given ways of organizing modules within an image, and versions of modules over time, we still need a way to organize them socially. We need a catalog of the available modules, what other modules they depend on, who maintains them, where to obtain them. We need technical supports to the Squeak community and the marketplace of modules. In the early days this was squeak-dev and the Swiki, but lately it's grown to include SqueakMap, SqueakPeople, SqueakSource and PackageUniverses, BFAV and the Mantis bug database.
I think Dan's modules team should focus on #1. PackageInfo is pretty primitive, and we really need something much better. At the same time, we're doing ok on items 2 and 3. So while we should definitely be aware of and discuss versioning and distribution issues, I'm happy that the opening statement Dan made for the modules team made no mention of either versioning or distribution. Those are jobs for Monticello and SqueakMap, or their successors.
Colin
On Fri, 25 Feb 2005 01:27:55 -0500, "Colin Putney" cputney@wiresong.ca said:
On Feb 23, 2005, at 1:47 AM, Florin Mateoc wrote:
Now, method versions are not interesting just for themselves. A different kind of code organization is a patch, or a unit of work which happens after the packaging structure has been defined, and is perhaps cross-cutting through many different packages. This is a changeset in a non-packaged, non-versioned, and limited-collaboration universe. In a versioned, packaged world, the changesets themselves should be versioned entities, and be composed of versioned sub-entities. It is especially for changesets that method-level versioning comes in handy, because here the finer, method-level granularity is needed. If you are forced to create new (entire) class versions for inclusion in a versioned patch, this not only adds noise, but it creates a much higher incidence of merging conflicts.
Absolutely, although I think with very fine-grained version history, spurious conflicts aren't really a problem. I've never used Envy, but I understand that it doesn't do merges well.
Envy actually has no automatic merging capability whatsoever. Which is odd, since you have fine-grained versioning (down to the method level) in Envy, so it wouldn't have been hard to write an Envy tool which would at least do a 3-way merge on a class and merge the non-conflicting methods for you. You wouldn't need a text-based merge algorithm for that. I think the Envy designers just never thought to add merge capability.
- Doug
On Fri, 25 Feb 2005 19:27:30 -0500, "Doug Way" dway@mailcan.com wrote:
Envy actually has no automatic merging capability whatsoever. Which is odd, since you have fine-grained versioning (down to the method level) in Envy, so it wouldn't have been hard to write an Envy tool which would at least do a 3-way merge on a class and merge the non-conflicting methods for you. You wouldn't need a text-based merge algorithm for that. I think the Envy designers just never thought to add merge capability.
Envy uses an operational model that makes it easier to avoid getting in the situation where you need to do a merge. You can't make changes to a class without creating an edition of it, and that shows up in the repository, so anyone else can see that you are working on it.
Its basically like the difference between optimistic locking (Store/MC) and pessimistic locking (Envy). They never bothered handling it because if you follow the class ownership model, and use the tools the way they are supposed to be used, you almost never need to do merges.
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Seeker (Micro Autonomous Underwater Vehicle) http://www.huv.com
On Feb 25, 2005, at 9:46 PM, Jon Hylands wrote:
On Fri, 25 Feb 2005 19:27:30 -0500, "Doug Way" dway@mailcan.com wrote:
Envy actually has no automatic merging capability whatsoever. Which is odd, since you have fine-grained versioning (down to the method level) in Envy, so it wouldn't have been hard to write an Envy tool which would at least do a 3-way merge on a class and merge the non-conflicting methods for you. You wouldn't need a text-based merge algorithm for that. I think the Envy designers just never thought to add merge capability.
Envy uses an operational model that makes it easier to avoid getting in the situation where you need to do a merge. You can't make changes to a class without creating an edition of it, and that shows up in the repository, so anyone else can see that you are working on it.
Its basically like the difference between optimistic locking (Store/MC) and pessimistic locking (Envy). They never bothered handling it because if you follow the class ownership model, and use the tools the way they are supposed to be used, you almost never need to do merges.
You're right, I had forgotten that Envy used a (optionally) pessimistic locking system, because the last time I used Envy, our team treated it as an optimistic system and we ignored the code ownership facilities.
I realize now that there are really two different kinds of pessimistic locking... Envy's system with code ownership (a sort of permanent locking), and tools such as SourceSafe/PVCS/etc which only let you lock a piece of code temporarily while you're actively working on it, there's no "ownership". Then there are optimistic systems such as CVS and Monticello with no locking.
Although Envy was flexible enough that you could use any of the above models if you really wanted. (But with optimistic locking it really helps to have some sort of merging capability.) Anyway, as someone who's switched back and forth between pessimistic/optimistic systems, I'd say that pessimistic systems are generally inferior. :-) (Especially the code ownership model.)
One nice thing about temporary pessimistic locking is that it tells you when someone else is working on something. It seems that this ought to be possible with an optimistic system, if you were saving the method versions in a central repository as Envy did. Even though a class wasn't "locked", it could tell you that so-and-so has been editing the class.
Anyway, I'm not sure if Dan really needs to follow all of these Versioning-related discussions... :) it seems like a powerful versioning/source code management system could be designed relatively separately? A versioning system only requires that there is a packages/modules system which defines/partitions source code.
- Doug
Anyway, I'm not sure if Dan really needs to follow all of these Versioning-related discussions... :) it seems like a powerful versioning/source code management system could be designed relatively separately? A versioning system only requires that there is a packages/modules system which defines/partitions source code.
I agree. When I was replacing PackageInfo by Package, I haven't had to deal anything with versionning... MC does it pretty well.
Alexandre
Doug Way wrote...
Anyway, I'm not sure if Dan really needs to follow all of these Versioning-related discussions... :) it seems like a powerful versioning/source code management system could be designed relatively separately? A versioning system only requires that there is a packages/modules system which defines/partitions source code.
And Alex Bergel replied...
I agree. When I was replacing PackageInfo by Package, I haven't had to deal anything with versionning... MC does it pretty well.
It's true that many of these issues of coordinating changes between multiple programmers, and keeping versions consistent are not the business of the module system as I see it.
However, it is still useful to have them in the early discussion, as I said before, because they will inform the design by showing what is needed. Ideally we will soon reach a point where we can say, "The module system takes care of X, Y, and Z, so the package management tool only needs to do P, Q, and R, and here's how they interface..."
- Dan
/--------------\ | NOTICE | --------------/
At midnight today, the Modules discussion will move to the Modules mailing list. Please make sure you have subscribed by that time if you wish to follow the discussions:
modules-subscribe@discuss.squeakfoundation.org
It is my hope to collect most of our context on the New Modules page,
http://minnow.cc.gatech.edu/squeak/5608
If you feel you have made important points that are not yet represented there, please put a link to it on that page.
I hope to start off next week with a discussion of how we pare down the issues, and focus on a coherent design. It will be useful if we can assume that most readers have at least read the information on the swiki by that time.
Thanks - Dan
On Sat, 26 Feb 2005 11:18:05 -0500, Doug Way dway@mailcan.com wrote:
One nice thing about temporary pessimistic locking is that it tells you when someone else is working on something. It seems that this ought to be possible with an optimistic system, if you were saving the method versions in a central repository as Envy did. Even though a class wasn't "locked", it could tell you that so-and-so has been editing the class.
One of the things I really like about Store and MC, and I hate about Envy, is the *need* to be connected to the central repository while developing code. If you only ever develop code on a desktop machine that you never take anywhere, it wouldn't be as big a deal, but personally I use a laptop for my main development machine (and have used laptops for this for the past 7 years). Being able to pull the network plug and continue working is essential to the style of development I do.
Later, Jon
-------------------------------------------------------------- Jon Hylands Jon@huv.com http://www.huv.com/jon
Project: Micro Seeker (Micro Autonomous Underwater Vehicle) http://www.huv.com
On Sat, 26 Feb 2005 11:18:05 -0500, Doug Way dway@mailcan.com wrote:
Anyway, I'm not sure if Dan really needs to follow all of these Versioning-related discussions... :) it seems like a powerful versioning/source code management system could be designed relatively separately? A versioning system only requires that there is a packages/modules system which defines/partitions source code.
+1. That's basically all I was trying to say in my earlier post (I knew I hadn't read everything yet...).
Avi
On Fri, 25 Feb 2005 19:27:30 -0500, Doug Way dway@mailcan.com wrote:
Envy actually has no automatic merging capability whatsoever. Which is odd, since you have fine-grained versioning (down to the method level) in Envy, so it wouldn't have been hard to write an Envy tool which would at least do a 3-way merge on a class and merge the non-conflicting methods for you. You wouldn't need a text-based merge algorithm for that. I think the Envy designers just never thought to add merge capability.
What's missing from Envy to have a really nice merge mechanism is the ability to record multiple ancestor versions for a single method or class version. That's what we're doing in MC2.
Avi
Colin Putney wrote:
Hi folks,
Here's that separate post on the Monticello-redesign I promised. Couldn't manage it yesterday. Dan, once it's sunk in a bit, I'll rework this into a summary for the modules team. (Note: MC stands for Monticello. MC1 is the version currently on SqueakMap, MC2 is the experimental version, which I expect to replace MC1 eventually).
As before, one of the key features of Monticello is that it captures enough development history to allow divergent branches of development to be safely and easily merged. The new thing in MC2 is that it keeps separate version history for each program element (classes, methods, instance variables etc) rather than for entire packages.
This gives us a lot more flexibility about how to group elements for versioning. Where MC1 is tightly bound to packages with sharp boundaries between them, MC2 is happy to work with just about any group of elements a developer decides he's interested in. I've been using the term "tag" for this - conceptually, program elements such as classes, methods, instance variables, globals and so on are annotated with tags, and when you take a snapshot of the tag, all the elements with the tag are included in it. This will allow us to do some handy stuff:
- Package-oriented versioning, similar to Store or MC1. This works
quite nicely for well-contained applications.
- Task-oriented versioning, similar to ChangeSets, but versionable and
mergeable. I could post a change set to the list, others could take it in different directions and I'd be able to safely merge the results back into a single change set.
I have worked on something similar, but instead of keeping them locally, versioning means putting them in the central repository as well. That way, people can browse even each other's work in progress (even if it's versioned, it does not mean it's releasable), no need to post attachements to the list, other people can take them in different directions even before you are ready :). More seriously though, this could be a very good way to collect patches (changesets are often patches) for release in the update stream. One would only need to point to the approved patches in the repository.
Concretely, there was a special, unversionable package, containing other special, unversionable packages, one for each developer (and named after them). All of these task-oriented versionable changesets, when versioned (when you first version it you also have to name it), appear in your own package. And the brwosers in the image have incorporated support for adding things to these changesets, either to the current (unnamed yet) one or to a named one.
- Robust update streams. We could automatically detect conflicts
between updates in the stream or between an update and local changes, and easily resolve them.
<snip>
They also have the notion of (stackable) overrides, so if in your package you change a method from a different package, you can easily browse both the override and the overriden code, but most importantly you can safely unload your package and things are restored properly.
We've supported overrides in MC1 for a while now, and AFAICT, they're more trouble than they're worth. That may be partly because PackageInfo makes it ugly to implement, but I think there are semantic issues as well.
Overrides imply a fixed load (and unload) order, and more subtly, a version-specific dependency. The overridden method has to keep working from the other package's point of view, and that gets really difficult when we've got a stack of overrides. In that case, we've got the expectations of 3 or more packages to satisfy with one method. When we violate package encapsulation that way, we create a really tight version dependency between the packages, which suggests that maybe they shouldn't be separate packages at all. They can't be developed and deployed separately. I don't know what's the best way to handle it, but I'm inclined towards just considering it a versioning problem. If the same method is defined in two packages, we've got two implementations to reconcile, right? With overrides, you resolved it according to the order that the packages were loaded. If we've got the versioning history as in MC2, we can use that information to make a better decision. If one implementation supersedes the other, use that one. If not, you've got a conflict and you let the user resolve it. Instead of choosing the implementation loaded most recently, we choose the one that was written most recently.
But overrides dont' exist (at least in Store) as such in packages. They are normal extensions (from the point of view of the package holding them) that become overrides only when you load that package in an image already containing that method. At the same time, they allow you to make your package self-sufficient. As you are developing it, no matter what else you (or your users) have in the image, you define a particular extension consistently with your package, as a "unit of separately deployable code". Surely you don't want to depend on another (version of another) package that defines that extension. You also can (should) not provide for other packages that may or may not be loaded. The most you can hope for is that right after loading your package, it should work. If a subsequent load changes things, that's fine, but that's as if the user manually modified one of your methods. You cannot protect against that. Making multiple independent packages work together is a much more difficult problem. The only way that I know how to solve that is by testing (and fixing), and I don't think that we should worry too much about providing an automatic solution. My personal preference would be to show in the Transcript that an override (of a method not in the base image) has occured. It does not matter if they are stacked or not, simply the fact that you overrode something from a different package makes it probable that the overriden package does not work anymore.
<snip>
All methods have to be versioned when the class (extension) is versioned, and this can be made automatically, just like classes have to be versioned when the package is versioned. In addition, the versions of methods that belong to a class version can be marked as special when browsing the method version history, just like class versions that belong to a package version can be marked specially. Sorry for the perhaps too low-level details, I just wanted to write things down. And Dan did ask us what we wanted to see in such a system :)
I don't see why this is necessary. Is there some semantic effect you're after here, or do classes and packages just provide convenient ways to group program elements together for a snapshot?
This is probably just the memory of a frustration with Envy: because it stores all these method editions (inluding every time you put a "halt" in a method), the noise level is pretty high, so I always wished that I could see at a glance, when looking at the list of editions for a method, which editions are "real". But even if we have explicit method versioning, so the noise is reduced, the most "real" ones are the ones associated with the holder's version, because there is an implicit minimal testing expectation for versions.For the method version I would expect something like a unit-test, for a class, the beginning of some functional testing. The expectation is even higher for the package, because it usually groups together classes working in tight coupling, so the testing done for a package version is more of a functional test, so now those methods "really" work. I guess it would be fun to disallow versioning if we detect that testing was not performed :) Seriously though, it might be interesting if we could link somehow versions to the tests performed.
Now, method versions are not interesting just for themselves. A different kind of code organization is a patch, or a unit of work which happens after the packaging structure has been defined, and is perhaps cross-cutting through many different packages. This is a changeset in a non-packaged, non-versioned, and limited-collaboration universe. In a versioned, packaged world, the changesets themselves should be versioned entities, and be composed of versioned sub-entities. It is especially for changesets that method-level versioning comes in handy, because here the finer, method-level granularity is needed. If you are forced to create new (entire) class versions for inclusion in a versioned patch, this not only adds noise, but it creates a much higher incidence of merging conflicts.
Absolutely, although I think with very fine-grained version history, spurious conflicts aren't really a problem. I've never used Envy, but I understand that it doesn't do merges well. Florin, is that true in your experience? We put a lot of effort into Monticello's merge capabilities, and they're generally pretty good. MC2 should be an improvement.
As Jon has already mentioned, if you follow their prescribed workflow, with strict class ownership, the class (extension) owner is a serialization point, only the owner can release it into the package, therefore (s)he has to review it and merge it. Envy versions know their ancestor, so you can easily tell what the other developer changed compared to a previously relased version. But there was no three-way merge browser, and on occasion it would have made life easier. The problem is that in very small teams you don't need such a workflow, everybody owns everything, and in very large teams, it does not scale. At a previous employer, I was part of such a very large team, with multiple locations, across the Atlantic. We had some tools to support a changeset model (Envy does not have one), but the changesets were stored as blobs in Envy, so they were unbrowsable, and they were mutable, so you could not tell what ended up being released, either in the development image or even in production, because developers would continue to work in the same changeset. Because they were mutable, they also didn't have any meaningful timestamp. I worked on an improved generation of tools, and I made them browsable and versionable. Obviously, I was also using them, and I found them very convenient. The one thing that I did not get to do was to make them method-level, their granularity was at the class (extension) version level, and it was this aspect that made conflicts more frequent than necessary: when you version one changeset, you have to version its contents as well, so you would version the (whole) class (extension); if another developer is working on the same class extension, (s)he would also have to version it, then when trying to release both changesets, they would appear to clash even though they may have fixed different bugs in different methods. Mind you, this is not as terrible as it sounds, because it would mean an additional pair of eyes would have to check things, and even if methods don't conflict directly they may conflict through side-effects. I for one am a bit queasy about automatic merging. Even when entire classes don't seem to clash they can affect each other (obviously if one is a superclass of a class from the other side of the merge) To support our workflow and the class ownership model, I also added state like "submitted for approval", "rejected", "approved"... with notifications through Envy and email that you needed to do something about it. I had not mentioned this part, because I don't think such a workflow would work in a loosely-coupled community like Squeak, although maybe some form of ownership might not be a bad idea.
<snip>
Thanks for the insight Florin,
Colin
Thank you for your insightful comments and for your work on MC,
Florin
Hi folks,
I'm moving this thread over to the Monticello mailing list. No need to clutter squeak-dev with versioning discussions. If you want to sign up the list is here:
http://mail.wiresong.ca/mailman/listinfo/monticello
On Feb 26, 2005, at 1:43 AM, Florin Mateoc replied to me thusly:
- Task-oriented versioning, similar to ChangeSets, but versionable
and mergeable. I could post a change set to the list, others could take it in different directions and I'd be able to safely merge the results back into a single change set.
I have worked on something similar, but instead of keeping them locally, versioning means putting them in the central repository as well. That way, people can browse even each other's work in progress (even if it's versioned, it does not mean it's releasable), no need to post attachements to the list, other people can take them in different directions even before you are ready :). More seriously though, this could be a very good way to collect patches (changesets are often patches) for release in the update stream. One would only need to point to the approved patches in the repository.
Sure, repositories are good. My point was more that you can share versioned code in whatever way is convenient. Actually, with Monticello you would mail to the list by saving the version to an SMTP repository. There are also repositories that use HTTP, FTP, local directories, object databases and even SqueakMap. I work out of a public HTTP repository, so you can peek at my work any time you like:
http://monticello.wiresong.ca/
[snip]
We've supported overrides in MC1 for a while now, and AFAICT, they're more trouble than they're worth. That may be partly because PackageInfo makes it ugly to implement, but I think there are semantic issues as well.
Overrides imply a fixed load (and unload) order, and more subtly, a version-specific dependency. The overridden method has to keep working from the other package's point of view, and that gets really difficult when we've got a stack of overrides. In that case, we've got the expectations of 3 or more packages to satisfy with one method. When we violate package encapsulation that way, we create a really tight version dependency between the packages, which suggests that maybe they shouldn't be separate packages at all. They can't be developed and deployed separately. I don't know what's the best way to handle it, but I'm inclined towards just considering it a versioning problem. If the same method is defined in two packages, we've got two implementations to reconcile, right? With overrides, you resolved it according to the order that the packages were loaded. If we've got the versioning history as in MC2, we can use that information to make a better decision. If one implementation supersedes the other, use that one. If not, you've got a conflict and you let the user resolve it. Instead of choosing the implementation loaded most recently, we choose the one that was written most recently.
But overrides dont' exist (at least in Store) as such in packages. They are normal extensions (from the point of view of the package holding them) that become overrides only when you load that package in an image already containing that method. At the same time, they allow you to make your package self-sufficient. As you are developing it, no matter what else you (or your users) have in the image, you define a particular extension consistently with your package, as a "unit of separately deployable code". Surely you don't want to depend on another (version of another) package that defines that extension. You also can (should) not provide for other packages that may or may not be loaded.
Yup, I agree that extensions are good. They allow us to do good OO design - putting methods on the classes where they belong - while still developing and maintaining a package as a single entity. I also acknowledge that if you allow extensions, you run the risk that two packages will define the same method. Ok, so the question is what do we do when that happens?
The most you can hope for is that right after loading your package, it should work. If a subsequent load changes things, that's fine, but that's as if the user manually modified one of your methods. You cannot protect against that.
Not quite. A user manually modifying a method does so explicitly, and presumably with full knowledge of the implications of doing so - how the method works, what packages call it and what they expect from it. You can't protect against the programmer making a mistake when modifying a method, and shouldn't attempt to.
But when loading another package incidentally modifies a method also defined in another package, it happens without the user's knowledge. We therefore don't get the benefit of assuming that the user knows best. When this happens, we need to alert the user, as you mention below.
Making multiple independent packages work together is a much more difficult problem. The only way that I know how to solve that is by testing (and fixing), and I don't think that we should worry too much about providing an automatic solution. My personal preference would be to show in the Transcript that an override (of a method not in the base image) has occured. It does not matter if they are stacked or not, simply the fact that you overrode something from a different package makes it probable that the overriden package does not work anymore.
I think we can do better than log the override to the Transcript. (Ok, perhaps with ENVY or Store you can't do any better, but luckily we're writing our own versioning system!) Consider the situation assuming that both packages are maintained with MC2:
We have two versions of a method, both with complete version history. Because we have the version history, it doesn't really matter that the two versions come from different packages, it's exactly the same as merging two versions of the same package. So instead of one version overriding the other, we do a merge. By comparing the method histories we can decide if one version supersedes the other. That would mean that it's an updated version of the other, which means we can rely on the user's wisdom again. If the user changed the method from one of the versions we have to the other one, he must know what he's doing. Therefore we use which ever version the user has already chosen.
If neither version of the method supersedes the other, we have a conflict, and we ask the user to resolve it. In his infinite wisdom, he'll give us a new version of the method that will work for both packages. Or, if his wisdom is less than infinite, at least he knows about the conflict and can choose which package to break.
Once the merge is complete, the user has effectively reconciled the conflict between the packages, and the new method can be incorporated in to one or both of the packages. Thereafter, loading won't produce a conflict won't require the user's attention.
<snip>
All methods have to be versioned when the class (extension) is versioned, and this can be made automatically, just like classes have to be versioned when the package is versioned. In addition, the versions of methods that belong to a class version can be marked as special when browsing the method version history, just like class versions that belong to a package version can be marked specially. Sorry for the perhaps too low-level details, I just wanted to write things down. And Dan did ask us what we wanted to see in such a system :)
I don't see why this is necessary. Is there some semantic effect you're after here, or do classes and packages just provide convenient ways to group program elements together for a snapshot?
This is probably just the memory of a frustration with Envy: because it stores all these method editions (inluding every time you put a "halt" in a method), the noise level is pretty high, so I always wished that I could see at a glance, when looking at the list of editions for a method, which editions are "real". But even if we have explicit method versioning, so the noise is reduced, the most "real" ones are the ones associated with the holder's version, because there is an implicit minimal testing expectation for versions.For the method version I would expect something like a unit-test, for a class, the beginning of some functional testing. The expectation is even higher for the package, because it usually groups together classes working in tight coupling, so the testing done for a package version is more of a functional test, so now those methods "really" work. I guess it would be fun to disallow versioning if we detect that testing was not performed :) Seriously though, it might be interesting if we could link somehow versions to the tests performed.
Ok, I see. You just want to define a group of program elements that should be versioned together. With Monticello you do this explicitly, so there's a lot less noise. Everything is a "real" version, and they correspond to a bunch of other "real" versions that were current at the same time.
[snip]
Absolutely, although I think with very fine-grained version history, spurious conflicts aren't really a problem. I've never used Envy, but I understand that it doesn't do merges well. Florin, is that true in your experience? We put a lot of effort into Monticello's merge capabilities, and they're generally pretty good. MC2 should be an improvement.
As Jon has already mentioned, if you follow their prescribed workflow, with strict class ownership, the class (extension) owner is a serialization point, only the owner can release it into the package, therefore (s)he has to review it and merge it. Envy versions know their ancestor, so you can easily tell what the other developer changed compared to a previously relased version. But there was no three-way merge browser, and on occasion it would have made life easier. The problem is that in very small teams you don't need such a workflow, everybody owns everything, and in very large teams, it does not scale. At a previous employer, I was part of such a very large team, with multiple locations, across the Atlantic. We had some tools to support a changeset model (Envy does not have one), but the changesets were stored as blobs in Envy, so they were unbrowsable, and they were mutable, so you could not tell what ended up being released, either in the development image or even in production, because developers would continue to work in the same changeset. Because they were mutable, they also didn't have any meaningful timestamp. I worked on an improved generation of tools, and I made them browsable and versionable. Obviously, I was also using them, and I found them very convenient. The one thing that I did not get to do was to make them method-level, their granularity was at the class (extension) version level, and it was this aspect that made conflicts more frequent than necessary: when you version one changeset, you have to version its contents as well, so you would version the (whole) class (extension); if another developer is working on the same class extension, (s)he would also have to version it, then when trying to release both changesets, they would appear to clash even though they may have fixed different bugs in different methods. Mind you, this is not as terrible as it sounds, because it would mean an additional pair of eyes would have to check things, and even if methods don't conflict directly they may conflict through side-effects. I for one am a bit queasy about automatic merging. Even when entire classes don't seem to clash they can affect each other (obviously if one is a superclass of a class from the other side of the merge) To support our workflow and the class ownership model, I also added state like "submitted for approval", "rejected", "approved"... with notifications through Envy and email that you needed to do something about it. I had not mentioned this part, because I don't think such a workflow would work in a loosely-coupled community like Squeak, although maybe some form of ownership might not be a bad idea.
With Monticello we've tried to support the open-source-distributed-development workflow as much as possible. This means lots of optimistic concurrent development, and no reliance on central repositories or sources of authority. This means lots of automatic merges, partial merges, and repeated merges, because branching happens a lot.
Perhaps surprisingly, we've found that merges aren't that hard to do well. The key is to have contextual information for the different versions we're dealing with. There are two dimensions to context - temporal and spacial. The temporal context of an of a method is its ancestry, the series of other versions of the method that had been modified to produce it. We can consider those versions to be superseded by the current one - evidently somebody had a reason to change the method and the sum of those reasons has resulted in this version.
The spacial context captures the method's relationship with other elements of the code. When a developer snapshots an entire package, he's essentially synchronizing the ancestries of all the elements in it. We then know that all those synchronized versions "belong together" in some way, even if that version of the package doesn't work.
If we have those two dimensions of context, we can do merges correctly: automatically applying changes that don't conflict, and correctly detecting genuine conflicts and presenting them to the user for resolution.
Colin
Colin Putney wrote:
Hi folks,
I'm moving this thread over to the Monticello mailing list. No need to clutter squeak-dev with versioning discussions. If you want to sign up the list is here:
Thanks for the link, I have just subscribed, and I am cc-ing it, but, as one of the things that we seem to discuss is wether to have overrides or not in our modules, I think it is still relevant to the general modules discussion.
<snip>
Making multiple independent packages work together is a much more difficult problem. The only way that I know how to solve that is by testing (and fixing), and I don't think that we should worry too much about providing an automatic solution. My personal preference would be to show in the Transcript that an override (of a method not in the base image) has occured. It does not matter if they are stacked or not, simply the fact that you overrode something from a different package makes it probable that the overriden package does not work anymore.
I think we can do better than log the override to the Transcript. (Ok, perhaps with ENVY or Store you can't do any better, but luckily we're writing our own versioning system!) Consider the situation assuming that both packages are maintained with MC2:
We have two versions of a method, both with complete version history. Because we have the version history, it doesn't really matter that the two versions come from different packages, it's exactly the same as merging two versions of the same package. So instead of one version overriding the other, we do a merge. By comparing the method histories we can decide if one version supersedes the other. That would mean that it's an updated version of the other, which means we can rely on the user's wisdom again. If the user changed the method from one of the versions we have to the other one, he must know what he's doing. Therefore we use which ever version the user has already chosen.
I am sorry, but this is simply not true. A developer may choose, in a newer version of a class, to ignore some unrelated development, and stick to an older protocol, by including some older versions for some of the methods. This is not a made up example, I have encountered the situation quite often. You can easily have, as a simplistic example, PackageA>ClassB>methodC(version1),methodD(version2) and PackageE>ClassB>methodC(version2),methodD(version1). The automatic resolution will do the wrong thing, and it won't even inform the user.
Making independently developed packages work together means (intelligent) work, and if there's any overlap, the chances of solving the issues automatically are, I believe, very slim, and versioning does not help. Even if all the common methods in one of the packages are newer versions (and descendants) of the same methods in the other packages, it still doesn't mean that they are made to work with the older package, it may simply mean that the newer package is supposed to work with a newer version of the older package. I think the only situation where you can say that there is no conflict is when the common methods are all identical, and for this you don't need versions. This is why, to my mind, overrides have nothing to do with versioning, they are simply a different kind of extension.
If neither version of the method supersedes the other, we have a conflict, and we ask the user to resolve it. In his infinite wisdom, he'll give us a new version of the method that will work for both packages. Or, if his wisdom is less than infinite, at least he knows about the conflict and can choose which package to break.
Once the merge is complete, the user has effectively reconciled the conflict between the packages, and the new method can be incorporated in to one or both of the packages. Thereafter, loading won't produce a conflict won't require the user's attention.
<snip>
All methods have to be versioned when the class (extension) is versioned, and this can be made automatically, just like classes have to be versioned when the package is versioned. In addition, the versions of methods that belong to a class version can be marked as special when browsing the method version history, just like class versions that belong to a package version can be marked specially. Sorry for the perhaps too low-level details, I just wanted to write things down. And Dan did ask us what we wanted to see in such a system :)
I don't see why this is necessary. Is there some semantic effect you're after here, or do classes and packages just provide convenient ways to group program elements together for a snapshot?
This is probably just the memory of a frustration with Envy: because it stores all these method editions (inluding every time you put a "halt" in a method), the noise level is pretty high, so I always wished that I could see at a glance, when looking at the list of editions for a method, which editions are "real". But even if we have explicit method versioning, so the noise is reduced, the most "real" ones are the ones associated with the holder's version, because there is an implicit minimal testing expectation for versions.For the method version I would expect something like a unit-test, for a class, the beginning of some functional testing. The expectation is even higher for the package, because it usually groups together classes working in tight coupling, so the testing done for a package version is more of a functional test, so now those methods "really" work. I guess it would be fun to disallow versioning if we detect that testing was not performed :) Seriously though, it might be interesting if we could link somehow versions to the tests performed.
Ok, I see. You just want to define a group of program elements that should be versioned together. With Monticello you do this explicitly, so there's a lot less noise. Everything is a "real" version, and they correspond to a bunch of other "real" versions that were current at the same time.
Even if you do it explicitely, not all versioning happens at the same time. I develop a method, it looks good, I test it a little (workspace, unit-test, whatever), I am happy with it and I want to keep it. I version it (separately, because this is what method-level granularity means). I work some more on the class, I refactor the code a little, break it up in multiple methods, I test it, I am happy with it, I version the class. I work some more on similar classes, collaborating classes, refactor, test, I am happy, I version the package. The method may have gone through several iterations (all versions), that are not noise, I have explicitely created all the versions, but they represent different stages in the evolution, different testing levels, and different confidence levels. If there is only one version of the method, because I versioned everything (all the containers) at once, then yes, they all go together. If not, the versions of this method that correspond to (are included in) class versions are slightly "better", the versions of the method that correspond with the class versions that are included in the package versions are even more so. And the version that is part of the production image is simply great :)
[snip]
Absolutely, although I think with very fine-grained version history, spurious conflicts aren't really a problem. I've never used Envy, but I understand that it doesn't do merges well. Florin, is that true in your experience? We put a lot of effort into Monticello's merge capabilities, and they're generally pretty good. MC2 should be an improvement.
As Jon has already mentioned, if you follow their prescribed workflow, with strict class ownership, the class (extension) owner is a serialization point, only the owner can release it into the package, therefore (s)he has to review it and merge it. Envy versions know their ancestor, so you can easily tell what the other developer changed compared to a previously relased version. But there was no three-way merge browser, and on occasion it would have made life easier. The problem is that in very small teams you don't need such a workflow, everybody owns everything, and in very large teams, it does not scale. At a previous employer, I was part of such a very large team, with multiple locations, across the Atlantic. We had some tools to support a changeset model (Envy does not have one), but the changesets were stored as blobs in Envy, so they were unbrowsable, and they were mutable, so you could not tell what ended up being released, either in the development image or even in production, because developers would continue to work in the same changeset. Because they were mutable, they also didn't have any meaningful timestamp. I worked on an improved generation of tools, and I made them browsable and versionable. Obviously, I was also using them, and I found them very convenient. The one thing that I did not get to do was to make them method-level, their granularity was at the class (extension) version level, and it was this aspect that made conflicts more frequent than necessary: when you version one changeset, you have to version its contents as well, so you would version the (whole) class (extension); if another developer is working on the same class extension, (s)he would also have to version it, then when trying to release both changesets, they would appear to clash even though they may have fixed different bugs in different methods. Mind you, this is not as terrible as it sounds, because it would mean an additional pair of eyes would have to check things, and even if methods don't conflict directly they may conflict through side-effects. I for one am a bit queasy about automatic merging. Even when entire classes don't seem to clash they can affect each other (obviously if one is a superclass of a class from the other side of the merge) To support our workflow and the class ownership model, I also added state like "submitted for approval", "rejected", "approved"... with notifications through Envy and email that you needed to do something about it. I had not mentioned this part, because I don't think such a workflow would work in a loosely-coupled community like Squeak, although maybe some form of ownership might not be a bad idea.
With Monticello we've tried to support the open-source-distributed-development workflow as much as possible. This means lots of optimistic concurrent development, and no reliance on central repositories or sources of authority. This means lots of automatic merges, partial merges, and repeated merges, because branching happens a lot.
Perhaps surprisingly, we've found that merges aren't that hard to do well. The key is to have contextual information for the different versions we're dealing with. There are two dimensions to context - temporal and spacial. The temporal context of an of a method is its ancestry, the series of other versions of the method that had been modified to produce it. We can consider those versions to be superseded by the current one - evidently somebody had a reason to change the method and the sum of those reasons has resulted in this version.
I have done this as well in the tools that I've developed in Envy (relying on ancestry to determine "real" conflicts). It does work probably for a majority of the situations, but when it fails it introduces subtle and hard to find bugs.
The spacial context captures the method's relationship with other elements of the code. When a developer snapshots an entire package, he's essentially synchronizing the ancestries of all the elements in it. We then know that all those synchronized versions "belong together" in some way, even if that version of the package doesn't work.
If we have those two dimensions of context, we can do merges correctly: automatically applying changes that don't conflict, and correctly detecting genuine conflicts and presenting them to the user for resolution.
Hopefully, most of the time. :)
Colin
On Feb 26, 2005, at 11:20 PM, Florin Mateoc wrote:
We have two versions of a method, both with complete version history. Because we have the version history, it doesn't really matter that the two versions come from different packages, it's exactly the same as merging two versions of the same package. So instead of one version overriding the other, we do a merge. By comparing the method histories we can decide if one version supersedes the other. That would mean that it's an updated version of the other, which means we can rely on the user's wisdom again. If the user changed the method from one of the versions we have to the other one, he must know what he's doing. Therefore we use which ever version the user has already chosen.
I am sorry, but this is simply not true. A developer may choose, in a newer version of a class, to ignore some unrelated development, and stick to an older protocol, by including some older versions for some of the methods. This is not a made up example, I have encountered the situation quite often. You can easily have, as a simplistic example, PackageA>ClassB>methodC(version1),methodD(version2) and PackageE>ClassB>methodC(version2),methodD(version1). The automatic resolution will do the wrong thing, and it won't even inform the user.
Ok, let's get into this in excruciating detail, because it's not clear to me why you think the above example cannot be resolved correctly. Let's say I have the following program elements, drawn from your example, with history.
PackageA
ClassB>>methodC.version1 (ancestors: version0) ClassB>>methodD.version2 (ancestors: version0, version1) ClassB>>MethodE.version1 (ancestors: version0)
PackageZ
ClassB>>methodC.version2 (ancestors: version0, version1) ClassB>>methodD.version1 (ancestors: version0) ClassB>>methodE.version2 (ancestors: version0)
Ok, so let's see what happens if we load both packages into the same image. PackageA and PackageZ both define methodC, and they have different definitions. So we've got to decide which version, if any, will be in the image. The version in PackageA is called version1, and it was derived from version0. The version in PackageZ is called version2, and lists version1 as its ancestor. Therefore, version2 was created by modifying version1. So we'll choose version2, from PackageZ.
MethodD has the reverse situation. PackageA's version is a descendent of PackageZ's, so we'll choose version2 again, but this time from PackageA.
MethodE presents a conflict. Both versions descend from a common ancestor, but neither descends from the other. So we pop up a conflict resolution window, and let the user decide what methodE should look like when both packages are present. This results in a new version, called version3. When we're done, the image looks like this:
ClassB>>methodC.version2 (ancestors: version0, version1) ClassB>>methodD.version2 (ancestors: version0, version1) ClassB>>methodE.version3 (ancestors: version0, version1, version2)
Now, you are correct to point out that, say, methodC.version2 might have been developed in PackageZ without PackageA loaded, and so might not work as PackageA expects. Perhaps we should indeed log to the Transcript when a merge automatically resolves overlapping packages. There is no *guarentee* that version2 will work right. But our chances of success are better if we follow the intention of the developer of version2, which was to replace version1. Following the order that packages are loaded is little different than choosing at random.
Making independently developed packages work together means (intelligent) work, and if there's any overlap, the chances of solving the issues automatically are, I believe, very slim, and versioning does not help. Even if all the common methods in one of the packages are newer versions (and descendants) of the same methods in the other packages, it still doesn't mean that they are made to work with the older package, it may simply mean that the newer package is supposed to work with a newer version of the older package. I think the only situation where you can say that there is no conflict is when the common methods are all identical, and for this you don't need versions. This is why, to my mind, overrides have nothing to do with versioning, they are simply a different kind of extension.
I agree that it takes intelligent work to make packages work together, and I'm not suggesting that the computer can do that. I am suggesting that, having done the work, we record the results so that we don't have to do it again everytime we load those packages.
[snip]
This is probably just the memory of a frustration with Envy: because it stores all these method editions (inluding every time you put a "halt" in a method), the noise level is pretty high, so I always wished that I could see at a glance, when looking at the list of editions for a method, which editions are "real". But even if we have explicit method versioning, so the noise is reduced, the most "real" ones are the ones associated with the holder's version, because there is an implicit minimal testing expectation for versions.For the method version I would expect something like a unit-test, for a class, the beginning of some functional testing. The expectation is even higher for the package, because it usually groups together classes working in tight coupling, so the testing done for a package version is more of a functional test, so now those methods "really" work. I guess it would be fun to disallow versioning if we detect that testing was not performed :) Seriously though, it might be interesting if we could link somehow versions to the tests performed.
Ok, I see. You just want to define a group of program elements that should be versioned together. With Monticello you do this explicitly, so there's a lot less noise. Everything is a "real" version, and they correspond to a bunch of other "real" versions that were current at the same time.
Even if you do it explicitely, not all versioning happens at the same time. I develop a method, it looks good, I test it a little (workspace, unit-test, whatever), I am happy with it and I want to keep it. I version it (separately, because this is what method-level granularity means).
Interesting, because that's not what I mean by "method-level granularity."
In MC1 (and Store, as far as I can tell), only packages have ancestry. The ancestry of a method has to be reconstructed by examing all the versions of the package it appears in and noting how it changes. In MC2 (and Envy, as far as I can tell), methods have individually recorded ancestry. This is why I say that MC2 versions at method-level granularity.
But that doesn't mean that you have to version new methods in isolation, whether explicitly or with every accept as in Envy. If you do that, you loose the spacial context I mentioned in my last post. That version is just noise, so why bother? It's not like the method is going anywhere. You can save your image without versioning it, and even if you manage to crash the VM you can always pull it out of the change log.
In MC1 you always version whole packages at a time. MC2 is more flexible, in that you can specify other ways of separating the code you are interested in from the rest of the image. But whatever your method of segregation, you always version all of it at once. So in this sense, Monticello versions at "project-level granularity," the project being whatever you're working on, be it a package, change set or whatever. It's important to do that so as to get the spacial context needed to merge snapshots correctly.
Colin
Dealing with class extension is very subtle. Approaches taken by Selector Namespace, Classboxes and Virtual Classes rely on bounding the visibility of class extension: the one who defines a class extension is the only one to see it. Conflicts are easily avoided in that way...
Cheers, Alexandre
On Sun, Feb 27, 2005 at 01:11:31AM -0500, Colin Putney wrote:
On Feb 26, 2005, at 11:20 PM, Florin Mateoc wrote:
We have two versions of a method, both with complete version history. Because we have the version history, it doesn't really matter that the two versions come from different packages, it's exactly the same as merging two versions of the same package. So instead of one version overriding the other, we do a merge. By comparing the method histories we can decide if one version supersedes the other. That would mean that it's an updated version of the other, which means we can rely on the user's wisdom again. If the user changed the method from one of the versions we have to the other one, he must know what he's doing. Therefore we use which ever version the user has already chosen.
I am sorry, but this is simply not true. A developer may choose, in a newer version of a class, to ignore some unrelated development, and stick to an older protocol, by including some older versions for some of the methods. This is not a made up example, I have encountered the situation quite often. You can easily have, as a simplistic example, PackageA>ClassB>methodC(version1),methodD(version2) and PackageE>ClassB>methodC(version2),methodD(version1). The automatic resolution will do the wrong thing, and it won't even inform the user.
Ok, let's get into this in excruciating detail, because it's not clear to me why you think the above example cannot be resolved correctly. Let's say I have the following program elements, drawn from your example, with history.
PackageA
ClassB>>methodC.version1 (ancestors: version0) ClassB>>methodD.version2 (ancestors: version0, version1) ClassB>>MethodE.version1 (ancestors: version0)
PackageZ
ClassB>>methodC.version2 (ancestors: version0, version1) ClassB>>methodD.version1 (ancestors: version0) ClassB>>methodE.version2 (ancestors: version0)
Ok, so let's see what happens if we load both packages into the same image. PackageA and PackageZ both define methodC, and they have different definitions. So we've got to decide which version, if any, will be in the image. The version in PackageA is called version1, and it was derived from version0. The version in PackageZ is called version2, and lists version1 as its ancestor. Therefore, version2 was created by modifying version1. So we'll choose version2, from PackageZ.
MethodD has the reverse situation. PackageA's version is a descendent of PackageZ's, so we'll choose version2 again, but this time from PackageA.
MethodE presents a conflict. Both versions descend from a common ancestor, but neither descends from the other. So we pop up a conflict resolution window, and let the user decide what methodE should look like when both packages are present. This results in a new version, called version3. When we're done, the image looks like this:
ClassB>>methodC.version2 (ancestors: version0, version1) ClassB>>methodD.version2 (ancestors: version0, version1) ClassB>>methodE.version3 (ancestors: version0, version1, version2)
Now, you are correct to point out that, say, methodC.version2 might have been developed in PackageZ without PackageA loaded, and so might not work as PackageA expects. Perhaps we should indeed log to the Transcript when a merge automatically resolves overlapping packages. There is no *guarentee* that version2 will work right. But our chances of success are better if we follow the intention of the developer of version2, which was to replace version1. Following the order that packages are loaded is little different than choosing at random.
Making independently developed packages work together means (intelligent) work, and if there's any overlap, the chances of solving the issues automatically are, I believe, very slim, and versioning does not help. Even if all the common methods in one of the packages are newer versions (and descendants) of the same methods in the other packages, it still doesn't mean that they are made to work with the older package, it may simply mean that the newer package is supposed to work with a newer version of the older package. I think the only situation where you can say that there is no conflict is when the common methods are all identical, and for this you don't need versions. This is why, to my mind, overrides have nothing to do with versioning, they are simply a different kind of extension.
I agree that it takes intelligent work to make packages work together, and I'm not suggesting that the computer can do that. I am suggesting that, having done the work, we record the results so that we don't have to do it again everytime we load those packages.
[snip]
This is probably just the memory of a frustration with Envy: because it stores all these method editions (inluding every time you put a "halt" in a method), the noise level is pretty high, so I always wished that I could see at a glance, when looking at the list of editions for a method, which editions are "real". But even if we have explicit method versioning, so the noise is reduced, the most "real" ones are the ones associated with the holder's version, because there is an implicit minimal testing expectation for versions.For the method version I would expect something like a unit-test, for a class, the beginning of some functional testing. The expectation is even higher for the package, because it usually groups together classes working in tight coupling, so the testing done for a package version is more of a functional test, so now those methods "really" work. I guess it would be fun to disallow versioning if we detect that testing was not performed :) Seriously though, it might be interesting if we could link somehow versions to the tests performed.
Ok, I see. You just want to define a group of program elements that should be versioned together. With Monticello you do this explicitly, so there's a lot less noise. Everything is a "real" version, and they correspond to a bunch of other "real" versions that were current at the same time.
Even if you do it explicitely, not all versioning happens at the same time. I develop a method, it looks good, I test it a little (workspace, unit-test, whatever), I am happy with it and I want to keep it. I version it (separately, because this is what method-level granularity means).
Interesting, because that's not what I mean by "method-level granularity."
In MC1 (and Store, as far as I can tell), only packages have ancestry. The ancestry of a method has to be reconstructed by examing all the versions of the package it appears in and noting how it changes. In MC2 (and Envy, as far as I can tell), methods have individually recorded ancestry. This is why I say that MC2 versions at method-level granularity.
But that doesn't mean that you have to version new methods in isolation, whether explicitly or with every accept as in Envy. If you do that, you loose the spacial context I mentioned in my last post. That version is just noise, so why bother? It's not like the method is going anywhere. You can save your image without versioning it, and even if you manage to crash the VM you can always pull it out of the change log.
In MC1 you always version whole packages at a time. MC2 is more flexible, in that you can specify other ways of separating the code you are interested in from the rest of the image. But whatever your method of segregation, you always version all of it at once. So in this sense, Monticello versions at "project-level granularity," the project being whatever you're working on, be it a package, change set or whatever. It's important to do that so as to get the spacial context needed to merge snapshots correctly.
Colin
Colin Putney wrote:
On Feb 26, 2005, at 11:20 PM, Florin Mateoc wrote:
We have two versions of a method, both with complete version history. Because we have the version history, it doesn't really matter that the two versions come from different packages, it's exactly the same as merging two versions of the same package. So instead of one version overriding the other, we do a merge. By comparing the method histories we can decide if one version supersedes the other. That would mean that it's an updated version of the other, which means we can rely on the user's wisdom again. If the user changed the method from one of the versions we have to the other one, he must know what he's doing. Therefore we use which ever version the user has already chosen.
I am sorry, but this is simply not true. A developer may choose, in a newer version of a class, to ignore some unrelated development, and stick to an older protocol, by including some older versions for some of the methods. This is not a made up example, I have encountered the situation quite often. You can easily have, as a simplistic example, PackageA>ClassB>methodC(version1),methodD(version2) and PackageE>ClassB>methodC(version2),methodD(version1). The automatic resolution will do the wrong thing, and it won't even inform the user.
Ok, let's get into this in excruciating detail, because it's not clear to me why you think the above example cannot be resolved correctly. Let's say I have the following program elements, drawn from your example, with history.
PackageA
ClassB>>methodC.version1 (ancestors: version0) ClassB>>methodD.version2 (ancestors: version0, version1) ClassB>>MethodE.version1 (ancestors: version0)
PackageZ
ClassB>>methodC.version2 (ancestors: version0, version1) ClassB>>methodD.version1 (ancestors: version0) ClassB>>methodE.version2 (ancestors: version0)
Ok, so let's see what happens if we load both packages into the same image. PackageA and PackageZ both define methodC, and they have different definitions. So we've got to decide which version, if any, will be in the image. The version in PackageA is called version1, and it was derived from version0. The version in PackageZ is called version2, and lists version1 as its ancestor. Therefore, version2 was created by modifying version1. So we'll choose version2, from PackageZ.
MethodD has the reverse situation. PackageA's version is a descendent of PackageZ's, so we'll choose version2 again, but this time from PackageA.
MethodE presents a conflict. Both versions descend from a common ancestor, but neither descends from the other. So we pop up a conflict resolution window, and let the user decide what methodE should look like when both packages are present. This results in a new version, called version3. When we're done, the image looks like this:
ClassB>>methodC.version2 (ancestors: version0, version1) ClassB>>methodD.version2 (ancestors: version0, version1) ClassB>>methodE.version3 (ancestors: version0, version1, version2)
Now, you are correct to point out that, say, methodC.version2 might have been developed in PackageZ without PackageA loaded, and so might not work as PackageA expects. Perhaps we should indeed log to the Transcript when a merge automatically resolves overlapping packages. There is no *guarentee* that version2 will work right. But our chances of success are better if we follow the intention of the developer of version2, which was to replace version1. Following the order that packages are loaded is little different than choosing at random.
But there is no way to divine the intentions of a developer from the version history. When a developer intentionally chooses to include an older version, his or her intention is to have that older version loaded, and not the newer one, regardless of their ancestry. You cannot say that you follow any developer's intention by automatically choosing a newer version instead of an older one. There is such a thing as intentionally reverting to an older version.
In our particular case, the only intention known for the developer of PackageA is that methodC.version1 should work with methodD.version2 and methodE.version1. For the developer of PackageZ we know that methodC.version2 should work with methodD.version1 and methodE.version2. There is no manifest developer who ever had the intention to load methodC.version2 together with methodD.version2, let alone having tested this combination. There is a big chance that because of the automatic merge in such a case, none of the two packages will work anymore. Quite often, in the same repository you have multiple streams of development. A new fix for an old, production image, may mean backporting from the current, development image. Some of the new methods are appropriate, some not. By letting the last package "win", not only do you not choose at random, but you get pretty close to guaranteeing that at least the last package will function. For me, at least in programming, where I like my universe deterministic, a "guarantee" for less is better than a hope for more.
Making independently developed packages work together means (intelligent) work, and if there's any overlap, the chances of solving the issues automatically are, I believe, very slim, and versioning does not help. Even if all the common methods in one of the packages are newer versions (and descendants) of the same methods in the other packages, it still doesn't mean that they are made to work with the older package, it may simply mean that the newer package is supposed to work with a newer version of the older package. I think the only situation where you can say that there is no conflict is when the common methods are all identical, and for this you don't need versions. This is why, to my mind, overrides have nothing to do with versioning, they are simply a different kind of extension.
I agree that it takes intelligent work to make packages work together, and I'm not suggesting that the computer can do that. I am suggesting that, having done the work, we record the results so that we don't have to do it again everytime we load those packages.
[snip]
This is probably just the memory of a frustration with Envy: because it stores all these method editions (inluding every time you put a "halt" in a method), the noise level is pretty high, so I always wished that I could see at a glance, when looking at the list of editions for a method, which editions are "real". But even if we have explicit method versioning, so the noise is reduced, the most "real" ones are the ones associated with the holder's version, because there is an implicit minimal testing expectation for versions.For the method version I would expect something like a unit-test, for a class, the beginning of some functional testing. The expectation is even higher for the package, because it usually groups together classes working in tight coupling, so the testing done for a package version is more of a functional test, so now those methods "really" work. I guess it would be fun to disallow versioning if we detect that testing was not performed :) Seriously though, it might be interesting if we could link somehow versions to the tests performed.
Ok, I see. You just want to define a group of program elements that should be versioned together. With Monticello you do this explicitly, so there's a lot less noise. Everything is a "real" version, and they correspond to a bunch of other "real" versions that were current at the same time.
Even if you do it explicitely, not all versioning happens at the same time. I develop a method, it looks good, I test it a little (workspace, unit-test, whatever), I am happy with it and I want to keep it. I version it (separately, because this is what method-level granularity means).
Interesting, because that's not what I mean by "method-level granularity."
In MC1 (and Store, as far as I can tell), only packages have ancestry. The ancestry of a method has to be reconstructed by examing all the versions of the package it appears in and noting how it changes. In MC2 (and Envy, as far as I can tell), methods have individually recorded ancestry. This is why I say that MC2 versions at method-level granularity.
But that doesn't mean that you have to version new methods in isolation, whether explicitly or with every accept as in Envy. If you do that, you loose the spacial context I mentioned in my last post. That version is just noise, so why bother? It's not like the method is going anywhere. You can save your image without versioning it, and even if you manage to crash the VM you can always pull it out of the change log.
Of course that you don't have to version the new method in isolation, but if you do want to, why not be able to do it? Who decides for me that something that I want to name and keep and reference later, share with others, is just noise, and I should not bother? In the course of developing something, the way the code progresses is meaningful, and you may want to maintain a history with these snapshots. This is a very cheap way of documenting what you are doing. Granted, the value of older versions decreases over time, but that is true for all versions, at every level, not just for methods. But I can tell you that every time I decided to purge something from arepository, to drop something from the history of a project, I came to regret it later.
In MC1 you always version whole packages at a time. MC2 is more flexible, in that you can specify other ways of separating the code you are interested in from the rest of the image. But whatever your method of segregation, you always version all of it at once. So in this sense, Monticello versions at "project-level granularity," the project being whatever you're working on, be it a package, change set or whatever. It's important to do that so as to get the spacial context needed to merge snapshots correctly.
Part of a "project" may mean reverting a method to an existing, older version, reverting a class to an older version, reverting a package to an older version. This happens quite frequently when developing a patch. What does it mean for these components' history that you re-version them?
Colin
Hi Florin,
I think I understand what you're getting at now. You prefer the "last package loaded" strategy because it guarantees that at least one package will remain in exactly the intended state, and thus presumably work. Ok, fair enough. I think this argument is moot any way since:
- What's appropriate probably depends on the circumstances under which we're running. If I as a developer am loading a handful of packages to continue development, I don't mind risking a little breakage. But that's probably not the case if we're deploying for end users.
- The modules system shouldn't be dependent on any versioning tool, so we may want overrides (or some other strategy for dealing with multiply-defined methods) as a fall-back for when the versioning system isn't present.
- Given Dan's interest in security, we'll probably want some mechanism for preventing such method collisions from occurring at all. Classboxes, Islands or E probably has some solution that is more elegant than either overrides or versioning.
Part of a "project" may mean reverting a method to an existing, older version, reverting a class to an older version, reverting a package to an older version. This happens quite frequently when developing a patch. What does it mean for these components' history that you re-version them?
In MC2, if a developer reverts to an older version, the history will show that. A new version will be created, with identical "content" to the old version, but with metadata showing the intermediate version as an ancestor.
Colin
Colin Putney wrote:
Hi Florin,
I think I understand what you're getting at now. You prefer the "last package loaded" strategy because it guarantees that at least one package will remain in exactly the intended state, and thus presumably work. Ok, fair enough. I think this argument is moot any way since:
- What's appropriate probably depends on the circumstances under which
we're running. If I as a developer am loading a handful of packages to continue development, I don't mind risking a little breakage. But that's probably not the case if we're deploying for end users.
Agreed
- The modules system shouldn't be dependent on any versioning tool, so
we may want overrides (or some other strategy for dealing with multiply-defined methods) as a fall-back for when the versioning system isn't present.
Agreed
- Given Dan's interest in security, we'll probably want some mechanism
for preventing such method collisions from occurring at all. Classboxes, Islands or E probably has some solution that is more elegant than either overrides or versioning.
Here, I am not so sure. As I see it, there are two reasons for having overrides. One is to offer an automatic resolution for accidental collisions, and as a way to guarantee that your package _can_ be loaded (without manual modifications) in its intended state, mentioned above. The other one, that I mentioned in my reply to Dan's message, is for intentional overrides of something that is known to exist and to be used in the pre-existing image (either because it is part of the base image, or because it is part of another, required package). Sometimes you do need hooks in other packages or in the base image, to attach yourself to a pre-existing state, and they are not general enough to justify a "fix" in the base image. Since they are specific to your package, they should belong as organizational structure to your package as well, just like normal (non-conflicting) extensions would. And it's not just about methods. As an example, your module wants to add some state to processes, so it needs an additional instvar in class Process. Shouldn't this change to a pre-existing class definition be contained in your module, be loaded with it, and be unloaded when the module is unloaded? This is clearly not an accidental collision, but it does not make sense by itself in the module originally defining class Process.
One other case where I regularly use overrides in VW is bug fixes. Even if I submit them and they will eventually get included in the distribution (some of them, after a loooooong time), where do they sit in the meantime? And what if they are performance optimizations, or other kinds of changes that would never be included in the base? What am I going to do if you won't like my changes to MC and will refuse to include them ? :)
Part of a "project" may mean reverting a method to an existing, older version, reverting a class to an older version, reverting a package to an older version. This happens quite frequently when developing a patch. What does it mean for these components' history that you re-version them?
In MC2, if a developer reverts to an older version, the history will show that. A new version will be created, with identical "content" to the old version, but with metadata showing the intermediate version as an ancestor.
Colin
Hi florin
I really like the quality of your discussion with colin. Thanks for that.
Here, I am not so sure. As I see it, there are two reasons for having overrides. One is to offer an automatic resolution for accidental collisions, and as a way to guarantee that your package _can_ be loaded (without manual modifications) in its intended state, mentioned above. The other one, that I mentioned in my reply to Dan's message, is for intentional overrides of something that is known to exist and to be used in the pre-existing image (either because it is part of the base image, or because it is part of another, required package). Sometimes you do need hooks in other packages or in the base image, to attach yourself to a pre-existing state, and they are not general enough to justify a "fix" in the base image. Since they are specific to your package, they should belong as organizational structure to your package as well, just like normal (non-conflicting) extensions would. And it's not just about methods. As an example, your module wants to add some state to processes, so it needs an additional instvar in class Process. Shouldn't this change to a pre-existing class definition be contained in your module, be loaded with it, and be unloaded when the module is unloaded? This is clearly not an accidental collision, but it does not make sense by itself in the module originally defining class Process.
I agree with you. In Classboxes (again I'm not saying that this is the solution) we took the idea that overrides, extensions, state extensions were local to the package doing them. I would really appreciate if you could comment on our papers (on the list or privately). I'm not sure that always having the semantics we gave to extensions is the one we want but at least we have a consistent world.
Stef
stéphane ducasse wrote:
Hi florin
I really like the quality of your discussion with colin. Thanks for that.
Here, I am not so sure. As I see it, there are two reasons for having overrides. One is to offer an automatic resolution for accidental collisions, and as a way to guarantee that your package _can_ be loaded (without manual modifications) in its intended state, mentioned above. The other one, that I mentioned in my reply to Dan's message, is for intentional overrides of something that is known to exist and to be used in the pre-existing image (either because it is part of the base image, or because it is part of another, required package). Sometimes you do need hooks in other packages or in the base image, to attach yourself to a pre-existing state, and they are not general enough to justify a "fix" in the base image. Since they are specific to your package, they should belong as organizational structure to your package as well, just like normal (non-conflicting) extensions would. And it's not just about methods. As an example, your module wants to add some state to processes, so it needs an additional instvar in class Process. Shouldn't this change to a pre-existing class definition be contained in your module, be loaded with it, and be unloaded when the module is unloaded? This is clearly not an accidental collision, but it does not make sense by itself in the module originally defining class Process.
I agree with you. In Classboxes (again I'm not saying that this is the solution) we took the idea that overrides, extensions, state extensions were local to the package doing them. I would really appreciate if you could comment on our papers (on the list or privately). I'm not sure that always having the semantics we gave to extensions is the one we want but at least we have a consistent world.
Stef
Hi stéphane
I have just read the papers on classboxes.
I think they are an interesting and useful concept. I don't think they represent all that we would want from a module system, but I will try to present my view for where they would fit in.
I too believe there is a strong case to be made for being able to limit the capabilities of modules, especially untrusted ones (either because of untrusted origin or because of the experimental nature of their code). In general though, as a way of organizing code, many modules that would be dsitinct because of a logical separation/classification do not need to be limited/isolated. Let's say we put collections in a separate module. They will depend on some core module, but they will enjoy the same level of trust, and if for example they add an extension method to Object, there is no need to limit the visibility of that method to the collections module. But this is an artificial example. Let's take my most common use of "overrides": a separately loadable module that contains bug fixes to the base image. Of course I want those fixes visible to everything else in the image.
On the other hand, if we let foreign Croquet code run in our image, we would most likely want to limit its capabilities.
With the above in mind, I think that it would make sense to assign levels of trust to modules and to link the visibility of the changes to the level of trust: if a module has a level of trust equal or greater than that of the modified (or imported, in your terminology) modules, its changes should be visible to the imported modules as well. If not, its bindings should be local.
Obvioulsy this does not completely solve the security issue, but I think it's one of the necessary steps.
Florin
hi florin
Hi stéphane
I have just read the papers on classboxes.
I think they are an interesting and useful concept. I don't think they represent all that we would want from a module system, but I will try to present my view for where they would fit in.
I too believe there is a strong case to be made for being able to limit the capabilities of modules, especially untrusted ones (either because of untrusted origin or because of the experimental nature of their code). In general though, as a way of organizing code, many modules that would be dsitinct because of a logical separation/classification do not need to be limited/isolated. Let's say we put collections in a separate module. They will depend on some core module, but they will enjoy the same level of trust, and if for example they add an extension method to Object, there is no need to limit the visibility of that method to the collections module. But this is an artificial example. Let's take my most common use of "overrides": a separately loadable module that contains bug fixes to the base image. Of course I want those fixes visible to everything else in the image.
True I always thought that we would like to have the two semantics
On the other hand, if we let foreign Croquet code run in our image, we would most likely want to limit its capabilities.
With the above in mind, I think that it would make sense to assign levels of trust to modules and to link the visibility of the changes to the level of trust: if a module has a level of trust equal or greater than that of the modified (or imported, in your terminology) modules, its changes should be visible to the imported modules as well. If not, its bindings should be local.
Interesting idea.;)
Obvioulsy this does not completely solve the security issue, but I think it's one of the necessary steps.
Florin
squeak-dev@lists.squeakfoundation.org