Why multiple change files?

List overview All Threads
Download

newer

older

About SimulatorInterpreter

[BUG] #widthOfString: in...

Trygve Reenskaug

7 Mar 2004 7 Mar '04

12:57 p.m.

Hi all, I am struggling to find an acceptable work process. I admit that I like what I am used to, and that new ways could be better ways.

I am used to having a single changes file with many image files, and I now get lost when I try to find some useful old stuff from somewhere in my 32 changes files.

My old image files showed different snapshots of my work, I could easily pick up exactly where I was two days ago.

My old, single changes file was my log and memory of all I had done in any context. I could easily pick up an earlier experiment even if it had been discarded using a ChangeList editor.

Some knowledge seems deeply embedded in the Squeak image: There are exactly TWO source files; all new stuff is written at the end of file number 2. The changes file must have a name exactly as derived from the image file name. None of this is technically necessary. They are conventions that, IMO, could better be implemented in an outer user interface layer.

I have 32 images in my current sequence. I also have 32 changes files. All the images could have worked just as well with a single, common changes file. They would work even better. For the very useful 'versions' command could show ALL versions of a given method. I could work in any of my 32 images; new stuff would be appended at the end and all images would be happy.

There may be a fundamental difference between VW and Squeak. In my VW/OOram images, new info is always appended at the end of the last sources file. A method knows its source as an index in the source files array identifying the file + an pair of indexes within this file to identify the string. So an image is never confused by other images writing THEIR stuff to the end of the last file.

So, given that Squeak is as it is and that I have given up doctoring it to suit my old working habits, how can I benefit from my 32 subtly different changes files?

Perhaps I should change working habits altogether. I'll be happy to do that as long as I retain control of my erratic working habits.

Thanks --Trygve

-- Trygve Reenskaug mailto: trygver@ifi.uio.no Morgedalsvn. 5A http://heim.ifi.uio.no/~trygver N-0378 Oslo Tel: (+47) 22 49 57 27 Norway

Show replies by date

Ned Konz

7 Mar 7 Mar

7:17 p.m.

On Sunday 07 March 2004 3:57 am, Trygve Reenskaug wrote:

...

I am struggling to find an acceptable work process. I admit that I like what I am used to, and that new ways could be better ways.

I am used to having a single changes file with many image files, and I now get lost when I try to find some useful old stuff from somewhere in my 32 changes files.

You only have 32 changes files. You're lucky. I have 100 of them for a total of 1.6 Gb, not counting my various backups and the ones that live on the partition now known as "/old/old/stuff" (yes, my housekeeping is abysmal).

...

Some knowledge seems deeply embedded in the Squeak image: There are exactly TWO source files; all new stuff is written at the end of file number 2. The changes file must have a name exactly as derived from the image file name. None of this is technically necessary. They are conventions that, IMO, could better be implemented in an outer user interface layer.

I have 32 images in my current sequence. I also have 32 changes files. All the images could have worked just as well with a single, common changes file. They would work even better. For the very useful 'versions' command could show ALL versions of a given method. I could work in any of my 32 images; new stuff would be appended at the end and all images would be happy.

There may be a fundamental difference between VW and Squeak. In my VW/OOram images, new info is always appended at the end of the last sources file. A method knows its source as an index in the source files array identifying the file + an pair of indexes within this file to identify the string. So an image is never confused by other images writing THEIR stuff to the end of the last file.

I have been thinking about this.

I would like to try (for myself) using a single repository that is in a Berkeley DB file. Multiple images could share it (BDB handles multiple clients and locking); there would only be one copy of each common method, and I could view all the history of my work.

This would let me track actual work instead of just method versions.

I could have "projects" which could be used in, tested in, and worked on from many images.

I could also store notes, Worlds/Projects, morphs, StarBrowser classifications, or other binary objects with my projects as needed. As well as annotations, etc.

What would it take to do this?

It looks as if we have 26 bits to play with in the CompiledMethod format.

Currently the encoding of the high byte is: 1 first 16M of .sources file 2 first 16M of .changes file 3 second 16M of .sources file 4 second 16M of .changes file This should give us room for 2^26 different method versions (over 67 million). As productive as the Squeak community has been, we aren't anywhere near that number yet.

One simple strategy to add another source file and maintain backwards compatibility would just be to round positions down to even numbers in the second source file (.changes). We could then pad with spaces when writing (if that's even necessary; it seems like it isn't since we can seek and then skip the '!' or CR character if that's where we land).

So the LS bit of the file position in the second source file could indicate the shared database...

OR... since the SqueakV3.sources file is not yet over 16M in size, we could gain another file by saying:

hi byte ((B25:B24)+1) 1 .sources file (16M max) 2 first 16M of .changes file 3 new database 4 second 16M of .changes file

and then we'd have space for 16777215 method versions in the new database.

If we got to the point where the .sources file got above 16M, we could start padding to 2-byte boundaries as above.

-- Ned Konz http://bike-nomad.com/squeak/

tim Rowledge

8:20 p.m.

Ned Konz wrote:

...

What would it take to do this?

It looks as if we have 26 bits to play with in the CompiledMethod format.

The new CompiledMethod format changes would allow much more than this; since the source pointer could be any object able to respond to the right message. In-image strings, remote strings, database accesses, encrypted strings, whatever.

I suspect there were two reasons for the external changes files back in the old days. a) memory space was limited and having a pointer to a string in a file saved a great deal of space b) OSs (such as they were) were so much less reliable than the marvellous, reliable, carefully designed wonders of modern times[1]. An external file, frequently flushed, gave some security for your code.

I suspect that Trygve might be happier with something like moticello along with some way of MC keeping a log rather like the changefile. OR does it already do that?

tim [1] Sarcastic, me?

Avi Bryant

10:48 p.m.

On Mar 7, 2004, at 11:20 AM, tim Rowledge wrote:

...

The new CompiledMethod format changes would allow much more than this; since the source pointer could be any object able to respond to the right message. In-image strings, remote strings, database accesses, encrypted strings, whatever.

Tim, what's the easiest way to test out those format changes?

Tim Rowledge

16 Mar 16 Mar

8:07 p.m.

Avi Bryant avi@beta4.com wrote:

...

Tim, what's the easiest way to test out those format changes?

For older images they've been pretty well tested over the years - remember I did the initial implementation back at Interval in 98(?) - and Anthony H. used it in his first pass at adding block closures. So barring changes in recent images that break something it ought to be pretty good.

The major problem of course is that it causes a break in backward compatability. Then again, we could take advantage of that to clear out quite a bit of fecal matter.

tim -- Tim Rowledge, tim@sumeru.stanford.edu, http://sumeru.stanford.edu/tim Useful Latin Phrases:- Raptus regaliter = Royally screwed

Lex Spoon

9 Mar 9 Mar

5:54 p.m.

That seems odd to me to mainain 30 different snapshots into the history of your project. It would seem that a proper code versioning system could do much better. So yes, your style sounds erratic to me. :)

Now, you can use most any change sets mechanism to make *code* snapshots and return to them. Monticello should work fine, for example. Also, I expect that Monticello would support branching code versions, but I don't know for sure.

For the record, Squeak's method and changes files work the way you describe, and it should be possible to make multiple images use the same changes file.

I don't know what the UI for sharing changes files should be, however. In fact, the idea mildly bothers me, because it complicates an already-complicated model. I would like to move in the other direction, and not have the changes file appear to the user at all. I'd rather we had some sort of "snapshot" file which included both image and a changes log.

-Lex

Avi Bryant

7 Mar 7 Mar

10:45 p.m.

On Mar 9, 2004, at 8:54 AM, Lex Spoon wrote:

...

That seems odd to me to mainain 30 different snapshots into the history of your project. It would seem that a proper code versioning system could do much better. So yes, your style sounds erratic to me. :)

Now, you can use most any change sets mechanism to make *code* snapshots and return to them. Monticello should work fine, for example. Also, I expect that Monticello would support branching code versions, but I don't know for sure.

Yes, it certainly can. With all of my source managed by Monticello, images have become largely throwaway for me - any time I feel like I'm getting too many images, I go through them all to make sure I've committed whatever modifications they have to an appropriate branch, and then delete them en masse. It's very easy to start from a fresh image and load in the right packages again.

This wouldn't work as well if you had a lot of non-code content, of course.

Dan Ingalls

8 Mar 8 Mar

1:07 a.m.

Lex wrote...

...

For the record, Squeak's method and changes files work the way you describe, and it should be possible to make multiple images use the same changes file.

I don't know what the UI for sharing changes files should be, however. In fact, the idea mildly bothers me, because it complicates an already-complicated model. I would like to move in the other direction, and not have the changes file appear to the user at all. I'd rather we had some sort of "snapshot" file which included both image and a changes log.

Also for the record, this is precisely what internalizeChangeLog does, or did 8 years ago. I dreamed this up as a way to keep changes well-supported while moving from one file system (Mac Toolbox) to another (Squeak cross-platform) in the earliest days of Squeak. Scott implemented it, and it has occasionally been of great value. It is the only way to fly when files are not available for one reason or another.

This gives you all the benefits of a changes log *except one*: namely the security of changes written on the disk and not lost if you crash. Therefore if you use this mechanism in serious development, you must remember "Never to play for more than you can afford to lose" or, in other words, to save often.

Dan

PS: To go further in the direction Lex desires, it might be reasonable to dribble the changes to a file, but to pull a compressed (*) and condensed copy into the image before a snapshot. This would keep the crash survival benefit, while minimizing the in-image space cost, and keeping a user model of one file only.

(*) the compressed sources mechanism I did a while back is actually designed to be incrementally writable as well as serving as a read-only sources file. I'd be willing to test that capability if someone turns out to care about it.

tim Rowledge

2:37 a.m.

Dan Ingalls wrote:

...

Also for the record, this is precisely what internalizeChangeLog does, or did 8 years ago. I dreamed this up as a way to keep changes well-supported while moving from one file system (Mac Toolbox) to another (Squeak cross-platform) in the earliest days of Squeak. Scott implemented it, and it has occasionally been of great value. It is the only way to fly when files are not available for one reason or another.

Th

This has probably occurred to someone before but for the mega-everything-loaded-all-singing-all-dancing-demo image it might be smart to suck all the sources into the image so that there is that much less to try to explain to people wanting a quick fix try out.

tim

Trygve Reenskaug

10:17 a.m.

Hi Dan, Long time no see.

I use the changes for two things: 1) To pick up stuff that I had discarded earlier. Example: I want to use very large Arial fonts for talks. One way to install fonts is to use TTFontDescription>>addFromTTFile: . Another is to use TTFontReader>>installTTF:asTextStyle:sizes:. There is also a bugfix suggested in e-mail http://minnow.cc.gatech.edu/squeak/2235 from Kris Gybels 2002-02-04. Suggesting that the depth should be set to one in TTGlyph>>asFormWithScale:ascender:descender:. I thought my font project was completed, but discovered a flaw that made me want to retrack my steps to try something else. Not wanting to reinvent debugged code, I searched the change files until I found what I needed.

2) To experiment. For example, when I wanted to discover how a balloon help is activated, or more precisely why it wasn't activated in a certain case, I examined the stack at different points in the process. At the crucial state, the stack was 29 activations deep, many of them reentrant and all of them obscure (to me). I had frequent crashes during this work, and used to changes file to recreate the situation up to, but not including, the last step. And then examine that step, modify it, and try again.

The reason why my work process is erratic is that I find it effective. I follow a main theme, but is happy to digress to improve my tools or to explore some niche in the system.

--------------------

I do not need the changes file to keep 'finished' packages etc. for my daily work. I keep them in my image. This is the best library there is, IMO. (Shipped products are still being generated by stripping the main image). If I start from a new release image, I use the workspace log to build a new system creation process. This is not automatic; I have to consider every step carefully to make sure my stuff will work properly in the new environment. So Monticello or an automatic versioning system would not help me. They overlap the information in my image.

I do see that the current change file scheme may be effective in some cases. I do not see why it is enforced deep down and wide around in the system. VW supports a maximum of 32 source files, the Squeak encoding seems to support exactly two. So that's that. I may remove the binding between image and changes names, however, but not today.

Best regards --Trygve

At 07.03.2004 16:07, you wrote:

...

Also for the record, this is precisely what internalizeChangeLog does, or did 8 years ago. I dreamed this up as a way to keep changes well-supported while moving from one file system (Mac Toolbox) to another (Squeak cross-platform) in the earliest days of Squeak. Scott implemented it, and it has occasionally been of great value. It is the only way to fly when files are not available for one reason or another.

This gives you all the benefits of a changes log *except one*: namely the security of changes written on the disk and not lost if you crash. Therefore if you use this mechanism in serious development, you must remember "Never to play for more than you can afford to lose" or, in other words, to save often.
    Dan

-- Trygve Reenskaug mailto: trygver@ifi.uio.no Morgedalsvn. 5A http://heim.ifi.uio.no/~trygver N-0378 Oslo Tel: (+47) 22 49 57 27 Norway

ducasse

11:04 a.m.

hi Trygve

Still I think that you should once experiment with the notion of build. A building process: reproduceable and automated sequences of instructions that produce your system.

When I started coding in Smalltalk I have a lot of images and this was the mess. Then I discovered Envy (not really user friendly) but this was the place where I published all my code. Then I arrived the morning took a fresh image, click on my last build, or the one five days ago because I knew the one of yesterday was not the one I wanted and in ***one*** click I got that. No million of redondant images anymore.

Images are cool and sweet places to live, hack. But there are not a good process to reproduce an artifact in time. Now I use Store (not really sexy too) but we coordinate 6 PhD and researchers on related but not the same project in the past I spent ***hours*** releasing script so that people could load the latest versions...now one click. Now we can track who did what, rollback. create new build....

You can achieve the same without tool support. This means that I have one specification: similar to the sar preamble I sent you where I specify how to reproduce the version 25 of my environment (ie load MW, load turtle 36, execute that, do that.... I should load in this order all these files. Until now I keep all the cs in a huge directory but I have different build environment scripts so that when I want to get back in time I need one click to load. I force myself to throw away images.

Now with monticello the process is easier. I'm migrating to that. For example, I load the breakOut, changes some code and publish it. I open a new image check if everything is ok. It is ok then all the code is stored into different folders and I can access them all the time. no need 10 mb for 124k of St code.

Imagine Squat plus a script = Squeak + another one = OORAM Squat plus a script = Squeak + stef one = Caro and Bot

What I can tell you is that once you have a build process, you feel much stronger and secure because you know that in one click you can reproduce and be in the same state as before. So this is worth to try and this is not against images. The two are different aspects of the same activities.

Stef

PS: measure the time you spend building your environment. I can tell you that now this time is only the loading time of code for me and in the MOOSE environment we have around 300 classes and with CodeCrawler 450.

Trygve Reenskaug

12:32 p.m.

Stef, I think you confuse two purposes: A lab journal records everything that is done in a series of experiments, including unsuccessful experiments (penicillin was discovered by an unsuccessful experiment)

A report gives a succinct description of the results of a series of experiments.

You describe tools for the report. They have always been in the form of fileIns, I'm sure Monticello if excellent for that. I am talking about tools for the lab journal, collected semi-automatically. Perhaps I am exceptional in needing a journal to remember exactly what I have done.

--Trygve

At 08.03.2004 11:04, you wrote:

...

hi Trygve

Still I think that you should once experiment with the notion of build. A building process: reproduceable and automated sequences of instructions that produce your system.

When I started coding in Smalltalk I have a lot of images and this was the mess. Then I discovered Envy (not really user friendly) but this was the place where I published all my code. Then I arrived the morning took a fresh image, click on my last build, or the one five days ago because I knew the one of yesterday was not the one I wanted and in ***one*** click I got that. No million of redondant images anymore.

Images are cool and sweet places to live, hack. But there are not a good process to reproduce an artifact in time. Now I use Store (not really sexy too) but we coordinate 6 PhD and researchers on related but not the same project in the past I spent ***hours*** releasing script so that people could load the latest versions...now one click. Now we can track who did what, rollback. create new build....

You can achieve the same without tool support. This means that I have one specification: similar to the sar preamble I sent you where I specify how to reproduce the version 25 of my environment (ie load MW, load turtle 36, execute that, do that.... I should load in this order all these files. Until now I keep all the cs in a huge directory but I have different build environment scripts so that when I want to get back in time I need one click to load. I force myself to throw away images.

Now with monticello the process is easier. I'm migrating to that. For example, I load the breakOut, changes some code and publish it. I open a new image check if everything is ok. It is ok then all the code is stored into different folders and I can access them all the time. no need 10 mb for 124k of St code.

Imagine Squat plus a script = Squeak + another one = OORAM Squat plus a script = Squeak + stef one = Caro and Bot

What I can tell you is that once you have a build process, you feel much stronger and secure because you know that in one click you can reproduce and be in the same state as before. So this is worth to try and this is not against images. The two are different aspects of the same activities.

Stef

PS: measure the time you spend building your environment. I can tell you that now this time is only the loading time of code for me and in the MOOSE environment we have around 300 classes and with CodeCrawler 450.

-- Trygve Reenskaug mailto: trygver@ifi.uio.no Morgedalsvn. 5A http://heim.ifi.uio.no/~trygver N-0378 Oslo Tel: (+47) 22 49 57 27 Norway

Avi Bryant

9:29 p.m.

On Mar 8, 2004, at 3:32 AM, Trygve Reenskaug wrote:

...

Stef, I think you confuse two purposes: A lab journal records everything that is done in a series of experiments, including unsuccessful experiments (penicillin was discovered by an unsuccessful experiment)

A report gives a succinct description of the results of a series of experiments.

You describe tools for the report. They have always been in the form of fileIns, I'm sure Monticello if excellent for that. I am talking about tools for the lab journal, collected semi-automatically. Perhaps I am exceptional in needing a journal to remember exactly what I have done.

Monticello is actually much closer to the lab journal - it's intended to capture not just the finished state of the code, but various checkpoints along the way. It doesn't do this linearly (as a changes file effectively does) - if you do an experiment, reject it, revert to a previous state, and continue from there, that branching structure will be captured. This is very useful if you later decide you want to merge two such branches, since it can do this semi-automatically by comparing them against their common ancestor.

Monticello may not have the granularity you want, however - it only records the state of your source code when you explicitly tell it to, not every time you edit a method. On the plus side, when you explicitly save the state Monticello encourages you to log some notes about that state, which makes reviewing your past work easier.

tim Rowledge

10 Mar 10 Mar

5:55 a.m.

Avi Bryant wrote:

...

Monticello is actually much closer to the lab journal - it's intended to capture not just the finished state of the code, but various checkpoints along the way. It doesn't do this linearly (as a changes file effectively does) - if you do an experiment, reject it, revert to a previous state, and continue from there, that branching structure will be captured. This is very useful if you later decide you want to merge two such branches, since it can do this semi-automatically by comparing them against their common ancestor.

Monticello may not have the granularity you want, however - it only records the state of your source code when you explicitly tell it to, not every time you edit a method. On the plus side, when you explicitly save the state Monticello encourages you to log some notes about that state, which makes reviewing your past work easier.

How much work would it be to to have MC track the individual compilations so that changelog like facilities could be integrated with the current ones?

tim

Colin Putney

4:17 p.m.

On Mar 9, 2004, at 11:55 PM, tim Rowledge wrote:

...

How much work would it be to to have MC track the individual compilations so that changelog like facilities could be integrated with the current ones?

Well, it would be possible now that we have Roel's SystemChangeNotification. Before that there was not way for MC to get notified of changes other than method compilations.

I'm not sure what you have in mind though. Are you imagining eliminating the changes file and storing all history in an MC repository? Or, going in the opposite direction, of somehow recording the "intermediate" states of a package when a version is saved?

Colin

7399

Age (days ago)

7408

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

14 comments

9 participants

tags (0)

participants (9)

Avi Bryant
Colin Putney
Dan Ingalls
ducasse
Lex Spoon
Ned Konz
tim Rowledge
Tim Rowledge
Trygve Reenskaug