Partitioning the image (was Re: Shrinking sucks!)
Ned Konz
ned at squeakland.org
Wed Feb 16 22:13:24 UTC 2005
> > (quotes are by me)
On Monday 14 February 2005 2:04 pm, Bernhard Pieber wrote:
> Just to avoid any misunderstanding. I propose to not only allow PI
> subclasses but to have only PI subclasses. All PI subclasses should be
> abstract. None of them should be instantiated. So there will be no
> problems of maintaining instances.
Since there's no way to call methods except via instances in Squeak, it's not
really correct to say that there wouldn't be PI objects. If you're proposing
to put all functionality on the class side of PI subclasses (which you'd have
to do in the absence of instantiating the subclasses), then the PI objects
will themselves be subinstances of Metaclass, right?
I don't see what the advantage of instantiating Metaclass subinstances would
be vs. instantiating singleton instances of subclasses of PackageInfo.
Is it just that you're comfortable with editing code in browsers? Code
browsers don't have to be our only hammer. It's not as if we could just start
using a package file format without making new tools of one sort or another
and changing the old tools. I would rather have a separate package browser to
view and edit package metadata than have to (say) search for implementors of
particular PI class methods, or to have to search literal strings across the
whole image for metadata. After all, wouldn't your proposal require such
metadata as maintainer name, etc. to be held in literal strings in compiled
methods?
If you look at the nature of class objects as opposed to instances of classes
other than Metaclass, you'll see that they're "special" objects that are
considerably more heavy-weight and more "dead" than other objects. Their
deserialization is very low performance (because of all the overhead of
creating and compiling a class), and doesn't fit in well with the existing
object serialization already built in to Squeak (see, for instance,
fileInObjectAndCode, where files are processed first by the compiler and then
by the serialization mechanism). Class objects also leave traces of
themselves behind (in change records, obsolete classes, etc.)
Another group of my concerns over the strategy (I think) you're proposing is
that it makes the creation of tools more difficult now and in the future. And
it is considerably less efficient and is also more likely to have
unacceptably dangerous side-effects. Here's some of the disadvantages that I
see:
A. requiring compilation of a part of the code in a package before loading it
(or even deciding to load it) means that we can't use existing tools to deal
with the code as easily. For instance, it should be easy to examine the
contents of a package (including all its metadata) and then decide not to
load it. And since we will want to have automated handling of packages (for
building images, testing, etc.) we can't rely on reading source code to do
it. To do this with a system requiring compilation of PI subclasses would
require several steps:
A1. parse all the code into a representational form, without compiling it (we
have three different ways to do this at present)
A2. locate all the code for the class side of all (possibly indirect?)
subclasses of PackageInfo
A3. compile that code. The problems with compilation are several:
- A3a. Security. You're introducing new code into the system from a possibly
untrusted source, and then you have to run that code to see the metadata.
Given that the metadata should include such things as digital signatures that
would be needed to decide on how much I trust the source of a package, I
don't think that I want to pay that price.
- A3b. Error handling. If the code being compiled has an error, how do you
deal with it?
- A3c. Undeclared references to as-yet-unloaded classes may be created.
- A3d. References to old versions of classes that would be updated by loading
the package may be created and result in problems if those classes' methods
are called.
- A3d. If the PI subclass happens to be named the same as an existing PI
subclass (which could happen easily) then the existing PI subclass would be
clobbered. If we detected such collisions and refused to compile the PI
subclass, then we couldn't view the package's metadata in any other way than
browsing its code.
- A3e. What if there are more than one PI subclasses in the package? What if
the subclasses aren't indirect subclasses of PackageInfo?
A4. call the appropriate methods to get the metadata. These, of course, are
code that was compiled from a possibly unknown source, and can do anything
they want because they're run with the full authorization that any other code
in the image has.
A5. to load, we would compile all the rest of the code in the package
(possibly re-compiling the PI subclass if we'd previously compiled it
unlogged). See below for issues around pre-installation and post-removal
methods.
A6. to unload, we have to delete the class that has just been compiled. And we
have to make sure that any traces of it have also been removed (in change
records, recent submissions, obsolete classes, references from
CompiledMethods, etc.). To make the removal process cleaner, we might have
chosen to compile the PI subclass without logging, but then we wouldn't have
the source code available for examination or editing. So if we decided to
load the package, we'd have to re-compile the PI subclass so that we could
see and edit its code.
B. Compare this with the alternative using serialized PI objects:
B1. Deserialize the PI object data from the file.
B2. Use existing tools to parse all the code into a representational form,
without compiling it (we have three different ways to do this at present)
B3. Query the PI object directly for the required metadata.
B4. To load, we would compile all the code in the package, just as we do now.
See below for issues around pre-installation and post-removal methods.
B5. To unload, just delete the single reference to the PI object.
> > But in the interest of simplicity, I'd be in favor of not supporting that
> > model, and just coming up with some way to instantiate a PI from a
> > package file without having to compile any code.
>
> Why do you think it is necessarily simpler if you avoid compiling code?
> A file in of a simple class - and I am arguing for very simple PI
> subclasses - seems very simple to me.
See above.
> Well, all in all that does not sound to me like TSTTMPW. I must admit
> that I have not thought at all about backward compatibility, though.
Nor about security, efficiency, or compatibility.
> What I can't really see is how you would handle the code parts a
> PackageInfo needs?
It would be in PackageInfo itself. That is, we wouldn't allow extension of the
instance variables in PI subclasses, and we also would require that the
methods used to report metadata wouldn't be overridden. So we serialize and
deserialize the PI instances as instances of PackageInfo, *not* as
subinstances.
There is the question of where to put the installation and removal code.
Currently we have four existing hooks for such code, and one missing place:
1. Pre-installation code can be run from CS preambles, and from the
'install/preamble' member in a SAR.
2. Post-loading code can be run from CS postscripts, and from the
'install/postscript' member in a SAR.
3. Post-compilation code (per class) can be run from class-side initialize
methods
4. Pre-removal code (per class) can be run from class-side unload methods.
5. Post-removal (cleanup) code has to be run from somewhere outside the
package (and outside the package file). It is possible that compiled blocks
or code strings for removal could be loaded from the package file at
installation time and held outside the package's classes.
Currently MCZ packages only offer hooks #3 and #4.
If we're going to move to a scheme in which all (or most of) such code is
managed as code within methods inside the packages, then:
1. Pre-installation code could still be run from SAR install/preamble members
(but this code wouldn't be managed as methods)
2. Pre-installation code in methods would require ensuring that those methods
(and by extension, those classes) would be compiled first and then run,
before compiling the rest of the package code. None of our existing tools
currently ensures this (though I think Monticello could be made to do this
ordering relatively easily). We could say that PI subclasses, if any, would
get compiled first and then have their appropriate pre-install methods run
first. Pre-installation could, by convention, be done in the PI subclass's
'initialize' method, or that could be reserved for post-loading code.
3. Post-loading code could be stuck into PI subclass methods that would be
called after loading the entire package.
3. Post-compilation code (per class) could still be run from class-side
initialize methods using the existing code for doing this.
4. Pre-removal code (per package) could be run from PI subclass methods.
5. Pre-removal code (per class) could still be run from class-side unload
methods.
6. Post-removal (cleanup) code has to be run from somewhere outside the
package (and outside the package file). As above, it is possible that
compiled blocks or code strings for removal could be loaded from the package
file at installation time and held outside the package's classes.
--
Ned Konz
http://bike-nomad.com/squeak/
More information about the Squeak-dev
mailing list
|