Partitioning the image (was Re: Shrinking sucks!)

Fri Feb 18 23:58:45 UTC 2005

Hi Ned,

First of all, thanks for this thorough answer to my proposal!

Ned Konz <ned at squeakland.org> wrote:
> On Monday 14 February 2005 2:04 pm, Bernhard Pieber wrote:
> > Just to avoid any misunderstanding. I propose to not only allow PI
> > subclasses but to have only PI subclasses. All PI subclasses should be
> > abstract. None of them should be instantiated. So there will be no
> > problems of maintaining instances.
> 
> Since there's no way to call methods except via instances in Squeak, it's not 
> really correct to say that there wouldn't be PI objects. If you're proposing 
> to put all functionality on the class side of PI subclasses (which you'd have 
> to do in the absence of instantiating the subclasses), then the PI objects 
> will themselves be subinstances of Metaclass, right?

:-) You got me! Of course, PI subclasses are instances as well. I
assumed you meant PI subclass instances when you wrote about problems of
maintaining instances.

> I don't see what the advantage of instantiating Metaclass subinstances would 
> be vs. instantiating singleton instances of subclasses of PackageInfo.
> 
> Is it just that you're comfortable with editing code in browsers? Code 
> browsers don't have to be our only hammer. It's not as if we could just start 
> using a package file format without making new tools of one sort or another 
> and changing the old tools. I would rather have a separate package browser to 
> view and edit package metadata than have to (say) search for implementors of 
> particular PI class methods, or to have to search literal strings across the 
> whole image for metadata. After all, wouldn't your proposal require such 
> metadata as maintainer name, etc. to be held in literal strings in compiled 
> methods?

Yes, one of my main arguments is that if I need code I want it to be
methods in classes. Not that I find our browsers so comfortable, it is
more my tendency to reuse the tools that are already there, and that are
not only the browsers. What I definitely would want is that browse
senders scans this code as well. By having methods in classes this is
automatically the case. But of course you are right, that code browsers
don't have to be our only hammer.

And yes, you are also right that my proposal means that metadata are
literal strings in compiled methods. Why that implies that I have to
search them, I don't quite understand. Can't I just send those messages
to the PI subclasses to access the metadata?

> If you look at the nature of class objects as opposed to instances of classes 
> other than Metaclass, you'll see that they're "special" objects that are 
> considerably more heavy-weight and more "dead" than other objects. Their 
> deserialization is very low performance (because of all the overhead of 
> creating and compiling a class), and doesn't fit in well with the existing 
> object serialization already built in to Squeak (see, for instance, 
> fileInObjectAndCode, where files are processed first by the compiler and then 
> by the serialization mechanism). Class objects also leave traces of 
> themselves behind (in change records, obsolete classes, etc.)

I admit that I did not consider this at all. Definitely good arguments.

> Another group of my concerns over the strategy (I think) you're proposing is 
> that it makes the creation of tools more difficult now and in the future. And 
> it is considerably less efficient and is also more likely to have 
> unacceptably dangerous side-effects. Here's some of the disadvantages that I 
> see:
> 
> A. requiring compilation of a part of the code in a package before loading it 
> (or even deciding to load it) means that we can't use existing tools to deal 
> with the code as easily. For instance, it should be easy to examine the 
> contents of a package (including all its metadata) and then decide not to 
> load it.

I basically agree with you. (I could probably argue that filing in a PI
subclass does not belong to loading the package itself but just to
loading the metadata. Each subclass could have a class inst var named
'loaded'. If PI subclasses are as lightweight as I envision them I
probably wouldn't find it too problematic to have them around in the
image, even if a package is not loaded. But I must admit it would be
nicer to have no traces of them in the image and changes file.)

>And since we will want to have automated handling of packages (for 
> building images, testing, etc.) we can't rely on reading source code to do 
> it. To do this with a system requiring compilation of PI subclasses would 
> require several steps:
>
> A1. parse all the code into a representational form, without compiling it (we 
> have three different ways to do this at present)
> A2. locate all the code for the class side of all (possibly indirect?) 
> subclasses of PackageInfo

I don't see the need for indirect subclasses of PI.

> A3. compile that code. The problems with compilation are several:
> - A3a. Security. You're introducing new code into the system from a possibly 
> untrusted source, and then you have to run that code to see the metadata. 
> Given that the metadata should include such things as digital signatures that 
> would be needed to decide on how much I trust the source of a package, I 
> don't think that I want to pay that price.

Good point!

> - A3b. Error handling. If the code being compiled has an error, how do you 
> deal with it?

Haven't really thought about it, but it would be equivalent to corrupt
serialization of a PI instance.

> - A3c. Undeclared references to as-yet-unloaded classes may be created.
> - A3d. References to old versions of classes that would be updated by loading 
> the package may be created and result in problems if those classes' methods 
> are called.
> - A3d. If the PI subclass happens to be named the same as an existing PI 
> subclass (which could happen easily) then the existing PI subclass would be 
> clobbered. If we detected such collisions and refused to compile the PI 
> subclass, then we couldn't view the package's metadata in any other way than 
> browsing its code.
> - A3e. What if there are more than one PI subclasses in the package? What if 
> the subclasses aren't indirect subclasses of PackageInfo?
> A4. call the appropriate methods to get the metadata. These, of course, are 
> code that was compiled from a possibly unknown source, and can do anything 
> they want because they're run with the full authorization that any other code 
> in the image has.
> A5. to load, we would compile all the rest of the code in the package 
> (possibly re-compiling the PI subclass if we'd previously compiled it 
> unlogged). See below for issues around pre-installation and post-removal 
> methods.
> A6. to unload, we have to delete the class that has just been compiled. And we 
> have to make sure that any traces of it have also been removed (in change 
> records, recent submissions, obsolete classes, references from 
> CompiledMethods, etc.). To make the removal process cleaner, we might have 
> chosen to compile the PI subclass without logging, but then we wouldn't have 
> the source code available for examination or editing. So if we decided to 
> load the package, we'd have to re-compile the PI subclass so that we could 
> see and edit its code.

These are all good points! I think you have convinced me that your
proposal is better. Probably more work, but cleaner. ;-)

> B. Compare this with the alternative using serialized PI objects:
> 
> B1. Deserialize the PI object data from the file.
> B2. Use existing tools to parse all the code into a representational form, 
> without compiling it (we have three different ways to do this at present)
> B3. Query the PI object directly for the required metadata.
> B4. To load, we would compile all the code in the package, just as we do now. 
> See below for issues around pre-installation and post-removal methods.
> B5. To unload, just delete the single reference to the PI object.
> 
> > > But in the interest of simplicity, I'd be in favor of not supporting that
> > > model, and just coming up with some way to instantiate a PI from a
> > > package file without having to compile any code.
> >
> > Why do you think it is necessarily simpler if you avoid compiling code?
> > A file in of a simple class - and I am arguing for very simple PI
> > subclasses - seems very simple to me.
> 
> See above.
> 
> > Well, all in all that does not sound to me like TSTTMPW. I must admit
> > that I have not thought at all about backward compatibility, though.
> 
> Nor about security, efficiency, or compatibility.

Right you are! ;-)

> > What I can't really see is how you would handle the code parts a
> > PackageInfo needs?
> 
> It would be in PackageInfo itself. That is, we wouldn't allow extension of the 
> instance variables in PI subclasses, and we also would require that the 
> methods used to report metadata wouldn't be overridden. So we serialize and 
> deserialize the PI instances as instances of PackageInfo, *not* as 
> subinstances.

So, would the code be strings in PackageInfo instances or methods in
PackageInfo subclasses? (I am little bit confused.)

> There is the question of where to put the installation and removal code.
> 
> Currently we have four existing hooks for such code, and one missing place:
> 
> 1. Pre-installation code can be run from CS preambles, and from the 
> 'install/preamble' member in a SAR.
> 2. Post-loading code can be run from CS postscripts, and from the 
> 'install/postscript' member in a SAR.
> 3. Post-compilation code (per class) can be run from class-side initialize 
> methods
> 4. Pre-removal code (per class) can be run from class-side unload methods.
> 5. Post-removal (cleanup) code has to be run from somewhere outside the 
> package (and outside the package file). It is possible that compiled blocks 
> or code strings for removal could be loaded from the package file at 
> installation time and held outside the package's classes.

I can't see why we need #2 when we have #3?

> Currently MCZ packages only offer hooks #3 and #4.
> 
> If we're going to move to a scheme in which all (or most of) such code is 
> managed as code within methods inside the packages, then:
> 
> 1. Pre-installation code could still be run from SAR install/preamble members 
> (but this code wouldn't be managed as methods)
> 2. Pre-installation code in methods would require ensuring that those methods 
> (and by extension, those classes) would be compiled first and then run, 
> before compiling the rest of the package code. None of our existing tools 
> currently ensures this (though I think Monticello could be made to do this 
> ordering relatively easily). We could say that PI subclasses, if any, would 
> get compiled first and then have their appropriate pre-install methods run 
> first. Pre-installation could, by convention, be done in the PI subclass's 
> 'initialize' method, or that could be reserved for post-loading code.
> 3. Post-loading code could be stuck into PI subclass methods that would be 
> called after loading the entire package.
> 3. Post-compilation code (per class) could still be run from class-side 
> initialize methods using the existing code for doing this.
> 4. Pre-removal code (per package) could be run from PI subclass methods.
> 5. Pre-removal code (per class) could still be run from class-side unload 
> methods.
> 6. Post-removal (cleanup) code has to be run from somewhere outside the 
> package (and outside the package file). As above, it is possible that 
> compiled blocks or code strings for removal could be loaded from the package 
> file at installation time and held outside the package's classes.

Thanks for the detailed analysis! Some thoughts:

How would you deal with errors during compilation or running of the
code? ;-)

I would prefer if the hooks would only call code on the package level.
>From there you can call per class code if needed.

To get this hooks right is the most important thing IMO. Probably the
most difficult as well. There is one thing we should think about well,
which makes it even more complex. However, I think it must be solved if
we eventually want to replace ChangeSets. I am talking about instance
migration. To do that it would be great to already have the new version
of the class while you still can access the old version and its
instances. To have convenient hooks for that would be great. But
probably the current mechanisms are powerful enough. I must admit that I
don't know them well enough to judge that.

- Bernhard