Partitioning the image (was Re: Shrinking sucks!)

Wed Feb 16 22:13:24 UTC 2005

> > (quotes are by me)
On Monday 14 February 2005 2:04 pm, Bernhard Pieber wrote:

> Just to avoid any misunderstanding. I propose to not only allow PI
> subclasses but to have only PI subclasses. All PI subclasses should be
> abstract. None of them should be instantiated. So there will be no
> problems of maintaining instances.

Since there's no way to call methods except via instances in Squeak, it's not 
really correct to say that there wouldn't be PI objects. If you're proposing 
to put all functionality on the class side of PI subclasses (which you'd have 
to do in the absence of instantiating the subclasses), then the PI objects 
will themselves be subinstances of Metaclass, right?

I don't see what the advantage of instantiating Metaclass subinstances would 
be vs. instantiating singleton instances of subclasses of PackageInfo.

Is it just that you're comfortable with editing code in browsers? Code 
browsers don't have to be our only hammer. It's not as if we could just start 
using a package file format without making new tools of one sort or another 
and changing the old tools. I would rather have a separate package browser to 
view and edit package metadata than have to (say) search for implementors of 
particular PI class methods, or to have to search literal strings across the 
whole image for metadata. After all, wouldn't your proposal require such 
metadata as maintainer name, etc. to be held in literal strings in compiled 
methods?

If you look at the nature of class objects as opposed to instances of classes 
other than Metaclass, you'll see that they're "special" objects that are 
considerably more heavy-weight and more "dead" than other objects. Their 
deserialization is very low performance (because of all the overhead of 
creating and compiling a class), and doesn't fit in well with the existing 
object serialization already built in to Squeak (see, for instance, 
fileInObjectAndCode, where files are processed first by the compiler and then 
by the serialization mechanism). Class objects also leave traces of 
themselves behind (in change records, obsolete classes, etc.)

Another group of my concerns over the strategy (I think) you're proposing is 
that it makes the creation of tools more difficult now and in the future. And 
it is considerably less efficient and is also more likely to have 
unacceptably dangerous side-effects. Here's some of the disadvantages that I 
see:

A. requiring compilation of a part of the code in a package before loading it 
(or even deciding to load it) means that we can't use existing tools to deal 
with the code as easily. For instance, it should be easy to examine the 
contents of a package (including all its metadata) and then decide not to 
load it. And since we will want to have automated handling of packages (for 
building images, testing, etc.) we can't rely on reading source code to do 
it. To do this with a system requiring compilation of PI subclasses would 
require several steps:

A1. parse all the code into a representational form, without compiling it (we 
have three different ways to do this at present)
A2. locate all the code for the class side of all (possibly indirect?) 
subclasses of PackageInfo
A3. compile that code. The problems with compilation are several:
- A3a. Security. You're introducing new code into the system from a possibly 
untrusted source, and then you have to run that code to see the metadata. 
Given that the metadata should include such things as digital signatures that 
would be needed to decide on how much I trust the source of a package, I 
don't think that I want to pay that price.
- A3b. Error handling. If the code being compiled has an error, how do you 
deal with it?
- A3c. Undeclared references to as-yet-unloaded classes may be created.
- A3d. References to old versions of classes that would be updated by loading 
the package may be created and result in problems if those classes' methods 
are called.
- A3d. If the PI subclass happens to be named the same as an existing PI 
subclass (which could happen easily) then the existing PI subclass would be 
clobbered. If we detected such collisions and refused to compile the PI 
subclass, then we couldn't view the package's metadata in any other way than 
browsing its code.
- A3e. What if there are more than one PI subclasses in the package? What if 
the subclasses aren't indirect subclasses of PackageInfo?
A4. call the appropriate methods to get the metadata. These, of course, are 
code that was compiled from a possibly unknown source, and can do anything 
they want because they're run with the full authorization that any other code 
in the image has.
A5. to load, we would compile all the rest of the code in the package 
(possibly re-compiling the PI subclass if we'd previously compiled it 
unlogged). See below for issues around pre-installation and post-removal 
methods.
A6. to unload, we have to delete the class that has just been compiled. And we 
have to make sure that any traces of it have also been removed (in change 
records, recent submissions, obsolete classes, references from 
CompiledMethods, etc.). To make the removal process cleaner, we might have 
chosen to compile the PI subclass without logging, but then we wouldn't have 
the source code available for examination or editing. So if we decided to 
load the package, we'd have to re-compile the PI subclass so that we could 
see and edit its code.

B. Compare this with the alternative using serialized PI objects:

B1. Deserialize the PI object data from the file.
B2. Use existing tools to parse all the code into a representational form, 
without compiling it (we have three different ways to do this at present)
B3. Query the PI object directly for the required metadata.
B4. To load, we would compile all the code in the package, just as we do now. 
See below for issues around pre-installation and post-removal methods.
B5. To unload, just delete the single reference to the PI object.

> > But in the interest of simplicity, I'd be in favor of not supporting that
> > model, and just coming up with some way to instantiate a PI from a
> > package file without having to compile any code.
>
> Why do you think it is necessarily simpler if you avoid compiling code?
> A file in of a simple class - and I am arguing for very simple PI
> subclasses - seems very simple to me.

See above.

> Well, all in all that does not sound to me like TSTTMPW. I must admit
> that I have not thought at all about backward compatibility, though.

Nor about security, efficiency, or compatibility.

> What I can't really see is how you would handle the code parts a
> PackageInfo needs?

It would be in PackageInfo itself. That is, we wouldn't allow extension of the 
instance variables in PI subclasses, and we also would require that the 
methods used to report metadata wouldn't be overridden. So we serialize and 
deserialize the PI instances as instances of PackageInfo, *not* as 
subinstances.

There is the question of where to put the installation and removal code.

Currently we have four existing hooks for such code, and one missing place:

1. Pre-installation code can be run from CS preambles, and from the 
'install/preamble' member in a SAR.
2. Post-loading code can be run from CS postscripts, and from the 
'install/postscript' member in a SAR.
3. Post-compilation code (per class) can be run from class-side initialize 
methods
4. Pre-removal code (per class) can be run from class-side unload methods.
5. Post-removal (cleanup) code has to be run from somewhere outside the 
package (and outside the package file). It is possible that compiled blocks 
or code strings for removal could be loaded from the package file at 
installation time and held outside the package's classes.

Currently MCZ packages only offer hooks #3 and #4.

If we're going to move to a scheme in which all (or most of) such code is 
managed as code within methods inside the packages, then:

1. Pre-installation code could still be run from SAR install/preamble members 
(but this code wouldn't be managed as methods)
2. Pre-installation code in methods would require ensuring that those methods 
(and by extension, those classes) would be compiled first and then run, 
before compiling the rest of the package code. None of our existing tools 
currently ensures this (though I think Monticello could be made to do this 
ordering relatively easily). We could say that PI subclasses, if any, would 
get compiled first and then have their appropriate pre-install methods run 
first. Pre-installation could, by convention, be done in the PI subclass's 
'initialize' method, or that could be reserved for post-loading code.
3. Post-loading code could be stuck into PI subclass methods that would be 
called after loading the entire package.
3. Post-compilation code (per class) could still be run from class-side 
initialize methods using the existing code for doing this.
4. Pre-removal code (per package) could be run from PI subclass methods.
5. Pre-removal code (per class) could still be run from class-side unload 
methods.
6. Post-removal (cleanup) code has to be run from somewhere outside the 
package (and outside the package file). As above, it is possible that 
compiled blocks or code strings for removal could be loaded from the package 
file at installation time and held outside the package's classes.

-- 
Ned Konz
http://bike-nomad.com/squeak/