[squeak-dev] Re: Our process, some loose ideas regarding DS + MC

Sun Aug 16 09:33:13 UTC 2009

Hi!

Andreas Raab wrote:
> Göran Krampe wrote:
>>> Simply the question of where/how DS are stored. Monticello has 
>>> repositories that store full source for a package. As a consequence 
>>> packages are large, but they allow you to see what's different in 
>>> your image, what a patch does, tracking history etc. You can agree on 
>>> a place and say "this is THE repository for package X" etc. I'm 
>>> wondering where DS stand with regards to these issues.
>>
>> Since the Stream part is not yet started I am all ears. But apart from 
>> the obvious fact that Deltas can serve the same roles as Changesets 
>> can (only better), I was envisioning chronological "streams" (as in 
>> continuous flows) of Deltas associated with individual developers, 
>> forks, packages and branches of packages etc.
> 
> A thing to keep in mind is that there is huge value in having a 
> definitive version of the code that you can compare against somewhere. 
> Being able to say "oh, these are my local changes" is a large part of 
> what makes working with MC superior to working with change sets. So it 
> would be really useful if one could say "compared with this repository, 
> you have applied delta x, y, and z, and in addition you have modified 
> methods foo and bar".

Yes, I agree - and DS was not meant to replace MC. MC does snapshots and 
maintains their history and DS captures "developer changes" in a fine 
granular fashion. But a combo of MC and DS would probably be very 
interesting.

>>> That's *very* useful. One of my favorite features when using MC is 
>>> that it can tell us if there is a conflict in a merge and that this 
>>> method requires special attention. If DS can do something similar by 
>>> telling us that the base version of a method is different from when 
>>> the DS was created this will be hugely helpful.
>>
>> This is in fact the *core idea*. The idea came about after watching 
>> Linus thoughts on git and to think about how MC and most SCMs work. 
>> They all get their "merge magic" from extensive knowledge of history 
>> to a common base. But that is something we don't have between forks.
> 
> Why not? Actually we do. MC will search any repository you add and if it 
> finds any common ancestor in any of the repositories it will use that. 

Yes, I know. But I still think we will end up with situations where the 
forks don't share enough history in order to do this. I may be wrong.

> I've done some pretty extensive merges that way and the only thing MC 
> lacks in this area is explicit support for cherry-picking (i.e., to 
> accept or reject changes even when they don't conflict).
> 
>> Thus, could we get 80% of magic using simpler tricks? The trick is to 
>> let the Changeset contain more info - especially info about the 
>> "before" state. This of course both enables unapply, but more 
>> essentially it enables much smarter apply-logic.
> 
> Interesting. I had thought that keeping the stamp or a hash would be 
> enough but you're right, having the actual previous version does allow 
> you to trivially revert to it. Very clever.

Also, a "perfect revert" is only possible if the Delta was "perfectly 
clean" when being applied. BUT... the cool trick is that if you are 
applying a Delta which is NOT perfectly clean (let's say a method being 
changed does not match the "before" state) then you just record a NEW 
Delta when applying the Delta and that new Delta will be able to do a 
perfect revert.

One interesting aspect here is that say you record a Delta with one 
single change - a class delete. The Delta will then create a composite 
change which contains all changes needed to recreate that class (class 
creation, method additions etc).

So if you then load that into a different image it can check that the 
class to be deleted is exactly the same as it was in the source image.

>> Also, a Delta can do other smart things since it has captured class 
>> definition changes in more detail - say you add an ivar "c" and when 
>> it checks the destination image it finds other additional ivars but no 
>> "c" - then it can merge by just adding "c". Nice eh? :)
> 
> Nice. But more a sign of weakness in Monticello ;-) But it's definitely 

Yes, but since DS captures "developer actions" it can in theory capture 
more info than MC can. For example, a class rename is captured as a 
class rename change. It can never be confused for a remove class + add 
another class.

> a good idea since this can cover additions that come from "other" 
> deltas. Which reminds me: Where are the deltas stored and how big is the 
> space overhead for keeping them?

You mean in the image? Since a Delta is a "fully self contained" object 
with no references to anything outside it and only contains "simple 
data" - we are quite free to do what we want with them. We could for 
example store them in a database and easily load/edit/save/search them.

Some simple file based repository scheme is of course needed. Perhaps 
just some kind of file naming convention to get a sort order.

One small idea I have is to perhaps use CouchDB as a repository option, 
would fit quite well and since CouchDB has inter db replication built in 
we could get a very nice base for hooking our streams together into 
larger and larger streams all the way to the sea :)

I have not measured the in image space overhead. Tirade that I am 
hooking in is quite fast in loading them:

http://goran.krampe.se/blog/Squeak/Tirade2.rdoc

Sidenote: One other strong advantage of using DS instead of changesets 
is that the DS default applier is SystemEditor which makes applying 
fully atomic.

regards, Göran