Partitioning the image (was Re: Shrinking sucks!)

Mon Feb 7 05:30:31 UTC 2005

Hi Göran and all...

On Feb 4, 2005, at 4:12 AM, goran.krampe at bluefish.se wrote:
>
> [SNIP]
>> Another issue concerning image cleanup is (and it was also discussed
>> here many times already) to find a way how to unload any package
>> completely. Even linux distributions do NOT clean up correctly in all
>
> Frankly - I don't think this part is at all the real problem. The
> problem is untangling.

Agreed.

Although it would be nice to be able to unload packages that are there.  
  But you can unload Monticello packages (and any other package type  
based on PackageInfo), which I guess you already know.

> But IMHO we should care that much about untangling for starters - the
> goals of TFNR (a dormant project) was to make sure we *assign
> maintainers* to *clearly defined parts* of the image in order to come
> around the harvesting bottleneck and to improve the sense of ownership
> and responsibility etc. I still very much think this is *THE* way to  
> go,
> but in parallell with other routes - see below.

Yeah, let's get TFNR kick-started again.  I will hop on board this  
time. :)

> And also, since we now have PackageRegistry I really think we should  
> get
> going with this.

What is PackageRegistry?  I don't see a class by that name, or a  
package on SqueakMap.  Is this just the list of packages which shows up  
when you open the "Package List" window?

>> cases. I know it can be almost impossible to remove methods added with
>> the package to classes outside it (like adding IsWhatever to Object).
>> Many people seems to think that until we don't solve this, we'll never
>> have 100% pure system.

Monticello/PackageInfo already does this, it handles methods added to  
outside classes.

>> Undefining messages is not a problem until two
>> packages will modify the same concurrently.
>
> Eh, well IMHO the problem with uninstalling packages is not exactly  
> what
> you describe.
> If all packages were forced to be constrained to be Monticello packages
> (without dirty code running in class initializers) - then we would more
> or less already have that. But they aren't.

The solution to this is obvious, I think. :)

First, make it easy to unload packages that can be unloaded.  For  
example, I don't understand why the "Package List" window (available  
even without MC) doesn't offer an "unload package" menu item.  They  
should be unloadable.  Of course, you can't guarantee unloadability if  
your image is dependent on the code being unloaded, but the package  
should already be "untangled" before it is made available.  (And there  
can be issues with unloading if you have a bunch of instances of  
unloaded classes in your image doing important things, but I don't  
think that's our biggest problem right now.)

Basically, the Package concept needs to be more available in the UI.   
There should be a package-centric browser.  The MC Snapshot Browser is  
pretty much the right UI for this, but it might be nice to have it  
available outside of MC, just operating on code in the image.  (But if  
Basic already includes MC, well, I guess the Snapshot Browser could be  
used.  But make it available from the Package List via a "browse  
package" menu item and in other places.)

Second, require that the partitioning of the base image must be via  
MC/PackageInfo packages.  Then work on detangling them so they are  
unloadable.

Then, people will get used to being able to unload certain types of  
packages, and there will be strong incentive for other package  
maintainers to make their packages unloadable too, by making them  
PI/MC/etc unloadable packages.

> Changesets and .st files and .sar files all have the possibility of
> *executing arbitrary code* on install. And in fact - Monticello  
> packages
> have that option too - using class side initialize methods.
>
> And that code can do stuff. And there is of course currently no
> theoretical way of restoring those actions. Two choices:
>
> 1. Don't allow code to run on package installation. Class initilization
> still needs to be done, so either that code HAVE TO be nice or we will
> have to come up with another way to do class initialization - or  
> sandbox
> it somehow. Realistically we would have to simply make sure that code  
> is
> "nice" and gradually move that code over to other more "declarative"
> mechanisms.
>
> 2. If arbitrary code still should be allowed to run - we need
> transactional object memory and/or some kind of sandboxes. Hehe, yeah,
> right.
>
> So given that 2 is simply too much work to just "do" over the weekend,
> number 1 is still the way to go. And given that we can't remove class
> initialization - we will have to make sure that the code running there
> behaves correctly. And then we would have to convert all packages over
> to the .mcz format (or .sars with only .mczs and no postscripts). Which
> of course wouldn't be a bad thing. :)

#1 seems like the obvious choice to me, too, at this stage.

Forget about changesets, .st files and .sar files, they have their  
purpose, but they don't need to be unloadable.  All real "packages"  
(i.e. applications or subsytems) don't need to be in those formats,  
they can be MC/PI-based.

Changesets are still good for things like bugfixes, or order-dependent  
changes to a living image, but they don't really need to be unloadable.  
  .sar files are good for installing things like fonts or arbitrary data  
into an image, but there's not a pressing need to have those be  
unloadable either.

>> But when we would have separate small packages, reconstructing the
>> image without one package is trivial (we just start again with empty
>> one and load all the other packages in), so we do not need to be able
>> to remove package cleanly, therefore the time wasted creating unload
>> scripts can be wasted with something more funny.
>
> Well, as I described above - it isn't that simple. You would still need
> to *produce* all those packages. Just try ripping out something from
> Morphic and you will soon understand what we mean. It is hard work. On
> the other hand, given that Tweak soon is operational (?) perhaps we
> stand a better chance.
>
> Now - in short, my choice of attack would be something like this (in  
> not
> exactly this order, and many things can be done in parallell):
>
> 1. TFNR. Yes. Let's finally partition the image and put people in  
> charge
> of each part. Yes, the parts will still be totally intertangled - but
> each line of code will have a maintainer. And each logical "tool" or
> "mechanism" will have a caring Squeaker. We should do this NOW.

Sounds good.  I am on board!

> 2. Get all the VM people to help Craig with Spoon. Get the VM changes  
> he
> has made into the regular official VM ASAP.
>
> 3. Create a category on SM called "unloadable package" (or something)
> and gradually start to migrate packages over to Monticello with *nice*
> class initializers. Again, unloadability is not key to all this, but we
> can still do it. Having stuff in Monticello format is on the other hand
> pretty important - especially for having correct upgrades (the other
> formats can't do that).

Yes!  Well, we don't have to do this for *all* packages on SqueakMap,  
but I think it makes sense for at least all of the Squeak-official  
(Full) packages.

> 4. Start moving tools over to Tweak, so that we can get rid of Morphic.
> And move to newer tools where appropriate - like Omnibrowser instead of
> the old browser etc. In essence we need Omnibrowser, a workspace,
> exporer/inspectors and a debugger. At least. :)
>
> 5. Get Monticello/SqueakMap and other low basics to load on top of
> Spoon. Then Tweak. Then OmniBrowser and the tools. :) Or whatever. We
> need to get a *head* on Spoon with a minimal tool env. Noone will  
> choose
> to "live" there until it has that at a *minimum*. So until we get there
> the manpower working on Spoon will be low.

These are important, although I think detangling the big chunks in the  
base image may be more urgent?  Perhaps you're already assuming the  
detangling will be mostly done in steps 1 and 3.

In other words, I think a lot more people will start working on things  
like this (4 and 5), once the basic chunks are detangled from the base  
image.

For example, if you have a working detangled headless Squeak 3.9/4.0  
Kernel image, even if it's still a bit lumpy and non-streamlined, you  
can compare that with Craig's Spoon image and see what the differences  
are.  And if you have a detangled Graphics package, you could try  
porting it over to the Spoon image.  Etc.

> 6. Get the next release of SM out with full dependencies.
>
> Well, something like that. Ok, do we have someone willing to be General
> on this? I can and should grab number 1 and get that done. And damnit,  
> I
> will. But I need help with the rest.

Overall, your plan sounds pretty compatible with what I was rambling  
about in this post:

http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-January/ 
087639.html

I can help with #1 (TFNR), and the update stream.  I could perhaps take  
charge on #3.

With #1, it might be good to have a basic plan of attack figured out  
here on squeak-dev first, before forming a separate mailing list of  
volunteers.  If there is a basic plan in place, it might actually help  
with getting more volunteers, because they'll know something will  
"really happen this time". ;)

My 2 cents on the plan:

We should probably use the update stream to do the partitioning and the  
detangling of the base image.  Changesets broadcasted to the update  
stream would contain partitioning doits and detangling changes.  There  
may be other ways to do the detangling, such as an MC-based scheme like  
the one Avi described here:  
http://people.squeakfoundation.org/article/39.html .  But I'm not sure  
it would work, for the reasons Andreas stated in his response.  Also, I  
think it may be easier with the update stream to keep everyone's  
changes "in sync" until everything is sufficiently detangled.  Everyone  
doing the partitioning/detangling would have access to the update  
stream.  It's easy to broadcast a changeset to the update stream, it  
works right now.

Basically, I don't think we can really consider getting rid of the  
update stream until #6 (full dependencies) is done.  And even then we  
may keep it.  Or not, who knows? ;)

My other thought is that we should make the initial partitioning as  
coarse as possible.  "Graphics" is one package, "Morphic" another,  
"Kernel", "Tools", etc.  This will make it easier to detangle, if  
you're only worrying about large pieces... don't worry about breaking  
the large pieces into smaller ones until later.  At this point, no one  
really cares that much about being able to unload small sections of  
Morphic such as Morphic-PDA, we just want to be able to separate  
Morphic itself from the rest of the system.  (Well, separating EToys  
from Morphic would be a nice second step... get Juan Vuletich working  
on that one.)

The partitioning could roughly correspond to the first section of each  
class category.  This is how MC/PackageInfo seems to work by default,  
anyway.  I just tried creating a "Morphic" package in Monticello, and  
it uses all of the classes in the 'Morphic-*' categories for its code,  
plus there are already some *morphic extensions in the base image which  
show up.  Saving the Morphic package results in a 2.1MB .mcz file, by  
the way.

(Then there's the whole issue that PackageInfo is based on the strings  
of the class categories and method categories to define the code that  
it contains, which is kind of a hack, but it works and is compatible  
with existing tools.  It wouldn't be hard to convert PackageInfo-based  
packages to a more "real" modules system later, after the detangling is  
done... it's the detangling that's the hard part.  I'm not sure we want  
to wait around for a more proper modules system before we begin  
partitioning & detangling.  I guess one problem with this is that you  
may end up changing the class categories of a lot of classes to make  
this work.  For example, "Collections-*" might have to be renamed to  
"Kernel-Collections-*" if we want the Collections-* classes to be in  
the Kernel package, which is where most of them would need to be.  Can  
the package names be unrelated to the class category names?)

Ok, those are my thoughts for now.  When do we start? ;)

(And I know the above is only 80% thought through, but announcing a  
plan and starting on it is a sure way to get feedback!)

- Doug