[squeak-dev] Subversion (was: Re: Perl is to CPAN as Squeak is to (what)?)

Colin Putney cputney at wiresong.ca
Mon Jun 30 14:57:24 UTC 2008


On 28-Jun-08, at 12:45 PM, Andreas Raab wrote:

> Colin Putney wrote:
>> On 28-Jun-08, at 5:27 AM, Claus Kick wrote:
>>> If push comes to shove, I would even say, lets ditch them all and  
>>> just use SVN like the rest of the planet (if that is possible). It  
>>> is hard enough to sell a image-based language with a real IDE to  
>>> the C-style crowd, the package  management systems should not add  
>>> their grain of salt to the soup.
>> Been there, done that... <shudder/>
>> Monticello was created because this turned out not to be feasible  
>> in practice.
>
> Can you say something more about that? A couple of weeks ago I saw a  
> demo at HPI in Potsdam where students used SVN down to the method  
> level, and it seemed to me that this approach might very well work  
> because the SVN granularity is the same as the in-image granularity.  
> It may also be interesting that this wasn't even trying to deal with  
> source files of any sort - it retained the nature of the image and  
> simply hooked it up directly with SVN. From my perspective this  
> looked like an extraordinarily interesting approach that I am  
> certain to try out as soon as it is available.

DVS, the precursor to Monticello, stored all the source code to each  
package in a single text file. Those files were then versioned using  
CVS. The file format was a modified chunk format, with the chunks  
sorted to prevent unnecessary textual churn. The usage pattern was to  
file out, commit, update and file in.

A large part of the problem came from this two step process for  
dealing with CVS. It was a hassle to keep track of the state of the  
image relative to the state of the CVS working copy. It was easy to  
make mistakes - commit when the wc wasn't up to date, develop when the  
image wasn't up to date, etc. That would lead to weirdness in the code  
that had to be manually sorted out.

Merge conflicts were another problem. The textual merging done by CVS  
wasn't smart enough to deal with a lot of the changes that would  
happen in development. For example, if two developers each added a  
method that sorted similarly, they'd get a textual conflict even  
though there was no conflict at the Smalltalk level.

As DVS developed we added functionality to minimize or work around  
these issues, until it became clear that it would be less effort to  
just keep our own version history and do our own merges. At that point  
we ditched CVS and renamed DVS to Monticello.

Now, this idea of using one file per method has come up before, and I  
believe it would eliminate many of the difficulties we had with DVS.  
Merging methods would get better, for sure. Merging class definitions  
would still be hassle, unless each instance variable, class variable,  
and pool import were defined in separate files. If the sources and  
changes files were eliminated, that would fix many of the  
synchronization problems that we had with DVS, since there would be no  
need to manually decide when to synchronize.

Still, I see two big problems with this approach.  One is that the  
synchronization problems don't entirely go away. What if some other  
process modifies the files on disk? How does the image find out about  
the change, and what should it do in response? What if the  
modification happens while the image isn't running? There are probably  
answers to these questions, but I doubt they'll be *good* answers.

The other big problem is that tens of thousands of tiny files is a  
horribly inefficient way to store source code. Yes, disk is cheap. But  
disk IO is not. I discovered this early in the development of MC2,  
when I implemented a type of repository that stored each method in a  
separate file. Loading OmniBrowser from that repository involved  
opening, reading, and closing over 600 files, and was very slow. I  
don't remember the exact timing, but I think it was like 5 to 10  
minutes, and in any case it was far too slow. Avi wrote a repository  
that stored every thing in a single indexed file, and now load time is  
dominated by compilation.

A quick doIt in my working image shows 44682 methods. Now imagine that  
on start up, the image scans all those files to make sure that all its  
compiled methods are up to date. That will take a very, very long time.

Colin



More information about the Squeak-dev mailing list