The perfect revision control system

Sat Feb 16 13:49:57 UTC 2008

On Feb 16, 2008 4:57 AM, Michael van der Gulik <mikevdg at gmail.com> wrote:
>
> I don't see any benefit in processing changes at the AST level. The
> same results can be achieved by comparing the source of two methods,

Not without knowledge of the source language, and if you have that
then you are almost certainly building your own AST of some sort.  I
suppose the question is really moot at the moment anyway as I can just
use the method level we have now for the moment and switch to
something more fine grained later.

> perhaps pre-compiling them and comparing their ASTs if you want. Plus,
> by having the method source, the VC system is more robust.

I'm by no means talking about throwing away the source code.  I'm
purely talking about the mechanism for determining dependencies, doing
commuting [1], undos, etc., etc., in the face of multiple concurrent
branches.

> I was hoping you were going to ask a more general question, such as
> "is this list of features sufficient for a VC system?".

This you're always free to list.  Of course I'm interested to know if
there are any features people would really like that aren't currently
covered.

The features of the system I'm planning will be:

*) Based on change sets (well, actually the more robust Delta stream
implementation) instead of snapshotting.  A consequence of this is
that one is no longer required to use *categories to associate your
changes into a package, but rather something a bit more sophisticated
then the current change sorter can be used to manage them.

*) "Cherry picking" of changes.  Smalltalk, with its simple syntax and
keyword arguments, is the best language I know of for writing self
documenting code.  But what is still missing is the *why* and *how did
we get here*.  Intelligent use of change sets can go further to answer
those questions.  When one makes a series of changes to fix multiple
bugs, they can after-the-fact move the changes into a separate set for
each bug so that later maintainers of the software have more
information to determine if the existing code is still relevant, etc.

*) Labeling.  In big companies using sophisticated revisions systems
(i.e. not obsolete stuff like SVN), people are branching, merging and
conflicting all the time.  Then when it comes time to release their
software they apply a "label".  This tags the latest version of all
data managed by the system so that in an audit, they can conclusively
prove what the state of the software was for any version.  This also
gives benefits to the system by allowing it to ignore everything
previous to the latest label (since the current state of the system is
simply: the latest label and all changes after that).

*) Fully distributed.  Anyone can make a copy of a given repository at
any time.  They can make changes that stay only in their own copy, or
push them up if they wish.  To make a branch, just make another copy
of the repository.  The repositories are updated strictly through the
mechanism of applying patches so your totally free to "cherry pick"
changes out of someone else's repository (and the patches don't have
to be applied in order) or sync up if you wish.

*) Multiple ways of managing changes.  Since the system will live in
the normal Squeak system, it has access to all the subsystems in the
image and can leverage them to apply or forward patches.  One example
of this is darcs cool feature of letting a remote user make a change
and use one command to have darcs package up the change in the correct
diff format and forward it to the package maintainer for peer review.

*) Compatible with other systems.  Of course no new revision system is
going to have a chance if it can't deal with packages from all the
other existing systems.  It may even be interesting to generate
packages from these other systems.

Well, that's the list off the top of my head.  I know MC2, and even
MC1 can do much of this (or probably all in the case of MC2), but MC1
is based on snapshotting which I disagree with and I think MC2 tries
to do too much.  I believe the "theory of patches" underlying darcs
that any system is simply the sum of applying all it's patches, and
therefor I don't think anything more then this is needed.

[1] In darcs if two changes (or patches) "commute" then they can be
applied in any order.  The above example would commute under darcs
because the lines modified in the second change don't touch the
modified lines from the first.  If we use a more sophisticated AST
based technique the two patches would not commute because the second
change depends on data from the first.