Untangling the code (was: Get this party started...)

Sat Mar 5 18:32:19 UTC 2005

One of the things you mention is the process of splitting up the
existing code so that it can be managed and loaded in chunks. This is
exactly the process I had in mind when I wrote SpaghettiTracer, since
then renamed MudPie.

First generalities just to put us all on the same page, then I'll
describe MudPie, then some of my thoughts about what would help to
make some progress on this. When I say progress, I mean on the
following specific goal -
*over time, the code in the squeak-dev released image, becomes
divisible into smaller packages*

I follow Avi in completely ignoring for now the code container
technology, and further also ignore whether and how code actually is
moved to packages.

Problem:
Squeak includes lots of code, and in terms of dependencies, not all
code is born equal.
A. There is some Kernel Stuff, which must be there for anything to
run. Includes all the classes that the VM knows about for example.
Everything depends on this. The kernel (shouldn't) depend on anything
else.
B. There are libraries that we are used to, for example Morphic, that
is dependent on the kernel, and that applications are dependent on.
C. Applications like Celeste depend on many different libraries.

The most obvious thing about dependencies is that their direction should be
A<-B<-C, and nothing else.

We can't break Squeak into lots of little packages because this is not
the case. In fact, Kernel classes depend on Morphic, Morphic knows
intimately some things that most people would not call libraries. We
have cycles. This means that after an application is removed, Morphic
can break.

MudPie:
What MudPie does, is identify these cycles. It also allows us to write
SUnit tests that say "X and Y should not be in the same cycle", so
that once something like that is fixed, it stays fixed.
MudPie at the moment, IIRC, works, but has zero UI. There are many
kinds of loving it could use in fact, including a rewrite, but it can
do what I said as is (modulo changes in PackageInfo since it was
written).

At a higher level, MudPie allows us to, at every moment, get an answer to:
1. "what can I break off and package now", and 
2. "what is preventing me from breaking off and packaging <XYZ>".

For something that is clearly an application, like Celeste, MudPie
isn't really needed, because since we know it's okay that it depends
on lots of stuff, and being an application, nothing should depend on
it, we can just ask "what depends on Celeste", and PackageInfo does
that nowadays. Same goes for the Kernel.

I think MudPie should become useful when you want to:
1. Assert/see something about the global structure of the code.
2. Deal with the middle level, for example untangling libraries that
are mixed into the whole system, like Morphic.

Elements in a solution:
1. Making the situation visible. 
Right now it is hard to measure progress. People "know that Squeak has
to much in it", but that is very vague. We can easily get the answer
to the question "What code outside the Kernel does the Kernel depend
on, transitively". This one question gives us lots of visibility into
what is wrong, cheaply. Get the number of classes from that code, and
we have a metric and a goal - 0. Later on, if we want more details,
MudPie can give lots of them (how much of a suite of "modularity
tests" is green, for example). Anyway, whatever metrics we choose to
code up, should be widely available, and not require every person
interested to learn to write some magic. We need a UI, or better, a
web UI to this information, maybe with history.
2. Some big refactorings.
I've seen lately calls for making Squeak's UI frameworks be pluggable
components, so that for example, inform: does the right thing, whether
Morphic, Tweak, or SeaSide is the current UI. This kind of refactoring
would break the upwards dependencies that create most of the cycles.
BTW, note that using MudPie doesn't require any further declarations
or new constructs, so the actual work of untangling code is nothing
more than refactoring (often just the renaming of method categories,
and moving classes between class categories). As the packaging team,
we should identify these refactorings, cooperate with people doing
them to make sure the chosen solution actually breaks the cycles, and
help get them into the image (reviews and so forth).
3. Maximize benefits: tools
Regardless of whether the code is packaged separately, having the code
dependencies be a DAG gives us various ways in which the tools can be
smarter. For example, I wrote a little something for the star browser
that does a topological sort on the Class Categories, according to the
dependencies, so that applications end up above libraries, which end
up above the Kernel, giving some meaningful clues about the meaning of
the code to the user. Telling the user which SUnit tests to fix first
is also an obvious application.
4. Maximize benefits: community
Seems to me that untangling the code will give us more opportunities
to swap code among the various Squeak subcommunities. Just saying that
we should be alert to this, help it, and let it drive us.

Looking forward to hearing your thoughts on this,

-- 
Daniel Vainsencher