Partitioning the image (was Re: Shrinking sucks!)

Mon Feb 14 16:35:29 UTC 2005

On Monday 14 February 2005 3:18 am, stéphane ducasse wrote:
> > Btw, the whole discussion about PI subclass or not - at this point I
> > don't have a good opinion.
> > There are arguments both ways it seems.
>
> Yes I do not have any taste yet in any directions...let's learn by
> doing.

It seems to me (having used PI instances, PI subclasses, and (for a while, I 
think) having had problems maintaining an instance of a PI subclass) that we 
need to lock down at least part of the responsibility of a PI.

I agree with this part of Lex's comment:

> Subclasses are great for experimenting, but if we want to move forward
> and write a bunch of package-aware tools, we need to commit to a single
> model that the tools can rely on.  Experimenters can continue to make
> subclasses if they want, but the tool-writers shouldn't need to deal
> with arbitrary changes.  
[...] 
> It looks like people have a pretty good feel what should go into a
> PackageInfo at this point.  The standard #classes, etc., methods are
> very good I think. 

From what I think I know and have heard, it seems like a PI:

* Must be able to list the classes that its package introduces (with the 
assumption that the package includes the entire contents of those classes, 
minus explicit extension methods from other packages)

* Must be able to list the extension methods that its package introduces 
(which of course implies a dependency on some other classes)

* May be able to report explicit dependencies on other packages (perhaps 
including version dependencies)

* May be able to provide other metadata about the package, including (some 
of?) the data needed for the package-level SM card.

* May be able to provide or point to package installation or deinstallation 
methods.

* May be able to provide or point to other utility methods like old instance 
migration, etc.

But whether or not you use subclasses, the problem of actually instantiating a 
PI to use it remains.

Right now PIs come into being as soon as you tell a tool like the MC browser 
the name of a package. Because they can map the package name into a list of 
class categories (and hence classes) and can also spot extension method 
categories in other classes, they can at least enumerate their contents.

This satisfies the first two "must" responsibilities above, but doesn't touch 
any of the others.

Going further than those two behaviors would require loading additional 
information from somewhere. The obvious sources of that information include:

1. a package file
2. a registry entry somewhere on the net (SM, MC, etc.)
3. direct communication with the source of the package (like for instance 
custom HTTP headers or other response to a DAV-like query to the server on 
which the package lives)

*Requiring* #2 or #3 means that we're going to prevent one of the modes of 
working that we've been comfortable with in the past: easy and casual 
distribution of package files via email or other means (FTP servers, 
downloads from a Swiki, etc.).

I think it would be a mistake to lose that ease.

The reason that each .mcz file carries around with it its entire ancestry is 
to make it possible to email a single .mcz file and have it immediately 
usable. If you happen to have some of its ancestors in hand, that gives you 
more usable information (you can look at version changes and do merges, for 
instance). You can browse a MCZ file without loading it, and see not only its 
code but its ancestry and dependencies on other MC packages.

CS files, of course, have much less file-level metadata (limited to preamble, 
postscript, and version stamp). But we have made tools that use both the 
date/version stamp (Conflict Checker) and allow you to browse the actual 
contents without loading the code.

I think that we should allow but not require #2 or #3. That is, I believe that 
*all* package-level metadata should be *able to be* included in a PI, and 
that it should be possible to read a PI from a package file without loading 
that package's (non-PI) code or prerequisites. 

Lex also said: 
> And thus, Joe Developer shouldn't be making a  subclass, either, if they 
want the standard tools to work with their package.

Whether or not we allow PI subclasses doesn't much matter much if we made the 
standard tools able to instantiate PIs from a package file. In the case of a 
PI subclass, obviously its code would have to be loaded and installed from 
the file first. But I don't see this as too much of a problem, except that if 
the PI wants to provide other services it would tend to be dependent on other 
(as yet unloaded) code in the package file.

But in the interest of simplicity, I'd be in favor of not supporting that 
model, and just coming up with some way to instantiate a PI from a package 
file without having to compile any code.

And Lex also said:
> Also, all PI's should have an optional link to a SM
> entry (and, if we want to be serious about making stable releases, a
> Universes entry).  All packages should have installation and
> deinstallation code.  (and re-configure code....)

If the PI is allowed to carry with it the metadata that is necessary to create 
a new SM entry (as well as being able to link to an existing SM entry) then 
we can decouple the tasks of package creation and SM registration, and can 
allow automatic registration based on the contents of sufficiently well 
identified packages.

So what are we missing? Easy: we don't at present have a package 
representation that includes a standard way to hold the information necessary 
to create a PI instance.

My suggestion, then, is this:

* We should define the required interface of a PI.

* We should keep the existing name-based (class category/*method category 
based) simple PI definition for cases where there are not yet package files, 
or where PI instances have not been added to package files. This would give 
us compatibility with existing file formats.

* We should require that a PI be able to provide its basic services without 
having to compile custom code (this doesn't disallow PI subclasses, but might 
require that PIs created from files get converted into instances of the 
appropriate subclass upon loading the package).

* We should come up with a standard way to serialize and deserialize PI 
instances. And that serialized representation should probably be textual, so 
that it can be humanly readable, mailable without damage, and easily included 
in changesets and other less-structured files.

* We might come up with a way to add this serialized PI data to the preamble 
or postscript of change sets, so that we can leverage existing tools and 
still be backwards-compatible.

* We have two existing Zip file formats that represent or can represent 
packages: SAR and MCZ (and MCD too, I guess). Both of these should define 
optional members to hold the serialized PI. This would give us backwards 
compatibility (those members would be ignored by older versions of MC or 
SARInstaller).

* We should modify existing tools (MC browser, SARInstaller, Code browser, CS 
loader, etc.) to support instantiating PI instances from package files where 
possible.

* We should allow well-defined package-level metadata to be optionally 
included in a PI. This would, for instance, let us create new SM cards from 
package files that include sufficient metadata.

* We should allow future extension of the metadata carried in a PI (perhaps by 
allowing arbitrary name/value pairs) in such a way that we don't break 
backwards compatibility.

What do you all think?
-- 
Ned Konz
http://bike-nomad.com/squeak/