On Mon, Jun 30, 2008 at 7:57 AM, Colin Putney cputney@wiresong.ca wrote:
The other big problem is that tens of thousands of tiny files is a horribly inefficient way to store source code. Yes, disk is cheap. But disk IO is not. I discovered this early in the development of MC2, when I implemented a type of repository that stored each method in a separate file. Loading OmniBrowser from that repository involved opening, reading, and closing over 600 files, and was very slow. I don't remember the exact timing, but I think it was like 5 to 10 minutes, and in any case it was far too slow. Avi wrote a repository that stored every thing in a single indexed file, and now load time is dominated by compilation.
It's worth pointing out that file-based version control has advanced significantly since we did this work - CVS and SVN are now far from the state of the art. I haven't used git much, for example, but it seems to be a well layered system, and it may be that we can build an alternative front end to its database which is image-based rather than working directory based. For example, imagine comparing an image directly to this index file rather than to a directory full of files on disk:
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#the-index
And look at this description of the workflow:
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#the-workflo...
I personally believe that we're better off with Smalltalk-specific version control, but if someone *is* looking at integration with more mainstream tools, I would strongly suggest they start with git rather than SVN.
Avi