package universes and filters question

Mon Aug 9 21:49:02 UTC 2004

Hi Lex and all!

Note: Tomorrow we are leaving to China so my mind is "elsewhere" but I
wanted to wrap up these issues so that I can go "offline" for the next
14 days and not worry. :)

lex at cc.gatech.edu wrote:
> I don't mean to split things off.  I actually proposed these ideas as
> future directions for SqueakMap, and only implemented them when you
> argued that they are bad ideas.  Most any future direction is fine with
> me, at this point, so long as we please use simple working solutions
> until something better comes along.

Simplicity has always been something I strive for in SM. As you recall
SM1 didn't even have releases - and that was "too simple". But it was a
good way to get things started so I don't regret it.

> Some possible directions I'd be very happy with:
> 
> 	1. SM could incorporates the universes model somehow.  There is just
> one important idea that keeps this from going smoothly; see below.

Number 1 here would be the best IMHO. :) I promise to look long and hard
at your proposal (I have already done so, but I want to be sure we *all*
understand the pros and cons here) and I also promise to try to "meet
the requirements" as long as I find an interest for them in the
community.

For example - the idea of being able to set up private "submaps" (or
whatever) or even public ones is a feature that I *want* to offer - I
just *also* want to avoid all the negatives of such a model, see below.

Perhaps a good way to go forward here is to start talking about use
cases? What are the use cases we want to support?

> 	2. You or someone else could pick up the Universes code; it doesn't
> need to be me at all, and in fact, I'd rather it were not me.
> 	
> 	3. SM and Universes could have crosslinks between them so that it's
> easy to post to both of them simultaneously.  This would be a handy
> addition to SqueakSource, for example.
> 
> In the meantime, I certainly suggest that people keep posting stuff on
> SqueakMap no matter how much they use Universes.  They are complementary
> tools.

Well, I would say they are competing tools. I may be wrong.

> I also intend to create a 3.7 stable universe in the next few weeks,
> because this is something Universes would seem to work well at, and it's
> something that will give people a good idea of how Universes works.

Ok, just as long as I know what you intend to do. I interpret this as a
kind of "fork" and well, I will just have to make sure SM can stand up
to the challenge.

> For the development stream, the best strategy is harder to see.  I'd
> guess that dependencies are a major thing to think about, but the
> conclusion you draw depends on how your and Stephan's dependencies come
> along.  Let's leave aside that discussion for a future email, because
> there's something more important I'd like to talk about.

Sure. In short - I have my plan clear in my head and partially working
in code. Stephan's plan adds more mechanisms "on top" (as I see it) and
thus, they seem worthwile to explore. If my planned simpler model will
suffice or not - we will see.

> Please consider carefully this one issue: does *everything*, down to
> every last minor variation of every package anyony posts for any purpose
> whatsoever, really need to be in the same index of packages?  As a mind
> experiment for what this map will be like, imagine from the Linux world
> that someone made a big index holding every package in every version of
> Slackware, Debian, Redhat, Gentoo, and OpenBSD.  My mental picture of
> this has a lot of minor variations of most packages.  There will be a
> dozen or more gcc-2.95's.

The comparison isn't "fair" IMHO. I could argue the other way and say
"Does all the 12000 packages (or how many they are now) in Debian really
need to be in one repository?".

What is large and what is small? What are the factors that would define
the "borders" of SM? Well, hard to say. Now, as a Squeaker I am not
interested in a VA package (and likewise a Debianee isn't interested in
a Redhat specific package). etc etc.

Btw - "same index of packages" above doesn't really say anything about
the architecture of it. For example - DNS could be argued to be a single
index - right? But of course technically it is highly distributed - but
for the user it acts as a single logical entity.

> As another mind exercise, suppose we made a universe that includes the
> current frontier of SqueakMap's packages.  This would not include every
> package, but it would include one or two version of quite a lot of them.
> In particular, it would include everything that one can currently
> reasonably expect to be installable into a 3.7 image.   Every 3-12
> months we could fork off a stable universe that only receives bug fixes
> and which has any horribly broken stuff removed from it; the unstable
> universe would continue on and would still have all of the same packages
> in it.
>
> Doesn't the first scenario sound a little hard to manage?  And doesn't

Not really, it has worked great the last two years. :)

> the second one sound just fine?  In the second scenario, users in both
> stable and unstable images would see precisely the set of packages that
> are reasonably maintained and installable.  Hackers that additionaly
> want to see unmainained packages can still do it, though they must work
> a little harder and use some sort of "multiverse browser" or access a
> catalog of everything.

This kind of reasoning doesn't help me. The first scenario you wrote was
"designed" to sound scary :) didn't work as a comparison. The second
"scenario" doesn't sound hard at all to maintain within a single logical
map (still disregarding any architecture questions).

The interesting thing here is that what we normally do in OO is to build
ourselves a good model of the domain and thus through mechanisms like
object identity etc we get a normalized non redundant model.

To me it is obvious that there should just be a SINGLE SMPackage called
SharedStreams - because there *is* only one such logical package. Then
it has multiple releases - and those have attributes. Simple, easy.

I can mark one release as beta. I can describe its dependencies. All
straight forward.

And the universe you describe above just turns out to be a *view* (=sub
selection of the model based on various criteria). So isn't it actually
*easier* to maintain one model together (the Squeak community isn't
large enough to maintain many IMHO) and then leave it up to each *user*
to select his/her "universe"?

It would be simple to filter out all packages marked as beta or better,
for my Squeak version with at least one published realeas available etc.
Tada.

> I ask about this one issue because if you let go of the idea of having
> absolutely everything in one index, then the rest of the universes
> approach seems to mesh very nicely with where Squeak Map seems to be
> going.  It becomes straightforward to let maps be merged into larger
> maps, and to let non-central maps be locally administered with their own
> accounts and policies.  We could then have simple dependencies, stable
> releases, mixin servers, private servers, and local update policies, all
> without requiring any further cleverness.

Lex - you make it sound sooo easy! But it isn't. I do want to
investigate how my "tree model" perhaps could be turned into a model
where maps can be mixed more freely BUT... it comes with lots of
effects:

If you look at the current SM model there are various crosslinks in the
model, just a few examples:

1. The categories.
2. The co-maintainers of a package.
3. The publisher (which maintainer published the release) of a release.

...and coming:

4. Resources attachable to other SMObjects, especially:
5. Configurations which is a resource owned by another maintainer than
the owner of the release it is attached to - and it refers to other
releases owned by other maintainers.

...and perhaps a few I have forgotten.

Now... if the above crosslinks weren't there at all - if the SM model
was just a bunch of independent object "trees" - then merging them would
be trivial, because the information would be "independent".

Now that is not true, and I have said this over and over. Julian told me
that, ok, but the "forreign" links - can't they simple be encoded as the
UUID so that they can be ripped out, put on another server and then
merged back together and then they get resolved? Hehe... sure. And what
happens when the thing it refers to isn't there anymore? etc. etc.

In short - I am saying for the 11th time: Splitting an object model over
multiple servers and then somehow magically being able to modify them in
a distributed fashion and just merging them back together... it is a
hard problem.

I am not saying it is impossible - because that would be stretching it.
But it takes a lot of code and it will create lots of situations with
"dangling pointers", copies out of synch etc. etc. A lot of decisions to
be made.

Now, instead of going over and over like a broken record, this is what I
plan to do:

0. Work out (and deploy) the dependency model so that the model can
"land" properly. It affects all this a LOT.

1. I have an idea on how to be able to distribute the map on multiple
servers with an update/commit model on them and make that work with a
reasonable effort. It involves giving each SMObject its own transaction
counter etc.

2. We could possibly drop the simplification I made with a tree of
servers (only one parent) and try to make it work with multiple parents.
It will involve lots of resolve/merge logic I guess.

But I am very afraid of the effects when people start setting up their
own little maps all over the place with lots of duplication, redundancy,
synch problems, servers being down/up, servers simply being unknown
etc... well. Obviously this doesn't scare you at all, which I don't
understand why. These are the things I can already hear:

"Hey! Where did you find that? Oh, I didn't know about *that* server....
Hmmm, it isn't up now, do you have a copy you can email me?"

"Hi! I just posted my little Application on my own map *here*. Bye!"

"Hmmm, does anyone have a list of all known maps at this point in time?
Sure, here is my list, but I heard that Ned has a bigger list with more
stuff on it."

"How many packages are there for Squeak? Well, we don't really know. Ok,
but can someone tell me where to look to find ZZZ? Sure, you can look
here, and here, and here, and perhaps over there..."

I mean - come on! Am I the ONLY ONE afraid of these scenarios? Am I the
ONLY one who remembers how it was before SM? Am I the only Debian user
who has been hunting for .deb packages on web sites, looking around for
entries to put in my sources.list, wondering where to find package XXX
that actually works on Debian?

All in all it is very, very simply. We need ONE model. We need ONE map.
No, it doesn't have to be centralized, it can work just like DNS or
whatever. Yes, you can have your own little server, just like with DNS.

Yes, it is a pain to write the code so that SM gives us all this.

> Finally, let me be very clear on one other thing: all of the efforts on
> things like dependencies, interfaces, compatible updates, branching
> versions, and simultaneous loading of multiple versions, are still very
> important no matter what we do in the short term regarding package
> catalogs.  The present discussion is just about what the next step or
> two should be.
> 
> 
> In summary, I would be happy to have one tool or toolset we all agree on
> using.  But while the really great Grand Unified Tools are still being
> developed, shouldn't we use simple existing solutions and make do as
> well as possible?

Of course we should. I just don't see why you portrait yourself as the
proponent for something "simple" and "existing" when in fact SM is the
thing that exists and is simple. :) Really.

> Regards,
> 
> Lex Spoon

regards, Göran

PS. Obviously we didn't get much closer to each other in these last
postings. I am aware of the fact that there are needs to be met, like
private maps and a working dependency model etc. I am focusing on
dependencies first - because that is what adds most to the *model*.