Partitioning the image (was Re: Shrinking sucks!)

Thu Feb 24 14:52:32 UTC 2005

Ned Konz <ned at squeakland.org> wrote:
> > Since there really seem to be two parts of a package -- the part that
> > makes sense before a package is loaded, and the part that makes sense
> > only when it has been loaded -- it would seem to make sense to have two
> > objects.  The latter part can then make lots of references to package
> > internals, because it will know the code has actually been loaded.
> 
> That could work too. I just figured that it would make more sense to do a 
> single deserialization followed by the usual code loading (probably in 
> pieces).

I see that you figure that :), but why?  We have two objects here, so I
don't understand why there is a push to make one replace the other if it
gets loaded.  It will work, but it seems like an unnecessary
complication.

It's not a big deal, I guess.  

> 
> > I think the following restrictions might be harsher than necessary:
> > > That is, we wouldn't allow extension of the
> > > instance variables in PI subclasses, and we also would require that the
> > > methods used to report metadata wouldn't be overridden. So we serialize
> > > and deserialize the PI instances as instances of PackageInfo, *not* as
> > > subinstances.
> 
> That restriction was merely to allow deserialization of the PI (sub)instance 
> without having to worry about shape changes, and before the PI subclass is 
> compiled. After all, we don't want to have to compile code just to see what a 
> package has in it!

Ah.  If we go with serialization, instead of making up a file format,
then another way around this is to first convert it to a semi-structured
form such as a dictionary of strings.  Then, serialize the dictionary. 
We don't have to bloat ourselves down with XML just to get
semi-structured representations.

That is, there could be a method in PackageInfo called
#dictionaryForExport, which creates a dictionary holding the core
metadata common to all PI's.

> > Still, some restrictions need to be there, it seems.  I am glad people
> > are generally agreeing that overriding #classes and the like is not
> > worth the extra flexibility.
> 
> I think the main argument for having an explicit enumeration of classes would 
> be that you might want to define classes that happen to live in different 
> categories.
> 
> And the main argument for having an explicit enumeration of extension methods 
> would be that you didn't want to hijack method categories for package 
> identification purposes (granted, this is just a prefix, but still it tends 
> to clutter up the names of the categories).

Yes, the hijacked method categories look bad in a browser, so it might
be worth keeping them around.  Any ideas on how to set up the UI, in
that case, so that people can designate which package a method goes
into?

On the other hand, what is the gain in having class categories still, if
we have the image thoroughly partitioned into packages?  It seems
confusing without having much benefit.  I haven't done a thorough
examination, but it seems that class categories get used differently
than method categories.  Class categories are already sort of like
packages.  On top of this, class-membership in packages is very likely
to be maintained carefully; on the other hand, both class categories and
method categories seem to be haphazard in practice.

> What hasn't been addressed -- and this is independent from how packages are 
> represented and stored -- is the issue of conflicting class definitions or 
> extension method definitions (or method categorization) between multiple 
> packages.

Yes, that's an important issue.

For getting started, "just say no" is good enough.  If you try to load a
package wit conflicts, you can't do it, and you are told why.  It's not
pretty, but it would get us going.

On the other hand, there will occasionally be a time when a package
really needs to override a method when it loads.  When that package
unloads, the previous definition of the method should come back!  In the
long run, we probably do want to have a way to intentionally specify
overrides.

Accidental conflicts, e.g. where two people just happened to make a >>
method in Behavior, are a different issue.  Maybe we still want to make
one package win.  Maybe we want to say you ca'nt load both packages at
all.  Or maybe we should take the Traits approach and say that the
method becomes uncallable until the problem is resolved.  

At any rate, it is vital that people can still manipulate code that has
conflicts.  This is one of the nicest things in Smalltalk -- that you
can keep going even though your code has wild bugs in it -- so let's try
to hang on to that in the packaging system, to.  If we "just say no" and
refuse to load a package with conflicts, then we should also make tools
for editing code in packages that haven't been loaded.  

Lex

PS -- can't partitioning start, anyway, regardless of any improvements
to PackageInfo??  Just make a swiki page listing who is going to
maintain which partition.  And those maintainers should then start
making efforts to make their partition unloadable.