Why multiple change files?

Sun Mar 7 18:17:58 UTC 2004

On Sunday 07 March 2004 3:57 am, Trygve Reenskaug wrote:

> I am struggling to find an acceptable work process. I admit that I like
> what I am used to, and that new ways could be better ways.
>
> I am used to having a single changes file with many image files, and I now
> get lost when I try to find some useful old stuff from somewhere in my 32
> changes files.

You only have 32 changes files. You're lucky. I have 100 of them for a total 
of 1.6 Gb, not counting my various backups and the ones that live on the 
partition now known as "/old/old/stuff" (yes, my housekeeping is abysmal).

> Some knowledge seems deeply embedded in the Squeak image: There are exactly
> TWO source files; all new stuff is written at the end of file number 2. The
> changes file must have a name exactly as derived from the image file name.
> None of this is technically necessary. They are conventions that, IMO,
> could better be implemented in an outer user interface layer.
>
> I have 32 images in my current sequence. I also have 32 changes files. All
> the images could have worked just as well with a single, common changes
> file. They would work even better. For the very useful 'versions' command
> could show ALL versions of a given method. I could work in any of my 32
> images; new stuff would be appended at the end and all images would be
> happy.
>
> There may be a fundamental difference between VW and Squeak. In my VW/OOram
> images, new info is always appended at the end of the last sources file. A
> method knows its source as an index in the source files array identifying
> the file + an pair of indexes within this file to identify the string. So
> an image is never confused by other images writing THEIR stuff to the end
> of the last file.

I have been thinking about this.

I would like to try (for myself) using a single repository that is in a 
Berkeley DB file. Multiple images could share it (BDB handles multiple 
clients and locking); there would only be one copy of each common method, and 
I could view all the history of my work.

This would let me track actual work instead of just method versions.

I could have "projects" which could be used in, tested in, and worked on from 
many images.

I could also store notes, Worlds/Projects, morphs, StarBrowser 
classifications, or other binary objects with my projects as needed. As well 
as annotations, etc.

What would it take to do this?

It looks as if we have 26 bits to play with in the CompiledMethod format.

Currently the encoding of the high byte is:
	1 first 16M of .sources file
	2 first 16M of .changes file
	3 second 16M of .sources file
	4 second 16M of .changes file

This should give us room for 2^26 different method versions (over 67 million). 
As productive as the Squeak community has been, we aren't anywhere near that 
number yet.

One simple strategy to add another source file and maintain backwards 
compatibility would just be to round positions down to even numbers in the 
second source file (.changes). We could then pad with spaces when writing (if 
that's even necessary; it seems like it isn't since we can seek and then skip 
the '!' or CR character if that's where we land).

So the LS bit of the file position in the second source file could indicate 
the shared database...

OR... since the SqueakV3.sources file is not yet over 16M in size, we could 
gain another file by saying:

hi byte ((B25:B24)+1)
1	.sources file (16M max)
2	first 16M of .changes file
3	new database
4	second 16M of .changes file

and then we'd have space for 16777215 method versions in the new database.

If we got to the point where the .sources file got above 16M, we could start 
padding to 2-byte boundaries as above.

-- 
Ned Konz
http://bike-nomad.com/squeak/