An idea, crazy or not? (Re: Very bad about Squeak in blogosfere)

Mon Aug 13 08:47:04 UTC 2007

Hi all!

(long post, sorry!)

Andreas Raab <andreas.raab at gmx.de> wrote:
> Colin Putney wrote:
> > It's stated a bit harshly, but yeah, that sounds basically accurate. The 
> > amazing thing is that, in spite of all that, Squeak is still such a 
> > wonderful platform to work with. I do use Squeak in production, and 
> > there are very few things I would trade it for.
> 
> Well, yes, but you can't deny that the guy's got a point. The 
> frustration he's expressing is something that everyone has felt over the 
> years. And while there are various plain invalid points in his post 
> (like the fact that Squeak has bugs - I'm *shocked* to hear that of 
> course and would have never started three products if I'd known that ;-) 
> the main emerging point is valid: The lack of quality and maintenance. 
> The problems he cites are all known, some of them even have fixes but 
> there isn't enough traction in the community to make this all come 
...
> together. And of course the forks don't exactly help because we still 
> haven't figured out how to share code across the forks and consequently 

And Andreas continues to describe the problem that I have been thinking
a bit about during my vacation.

Last night I wrote it up and intended to bounce it around "privately"
first on selected people, but reading this thread it is so inviting to
share the idea so what the hell :) Forgive me if this post is long - but
I want to paint my picture as best I can.

IMO these are our major problems today (we have lots of minor ones too
and the problems intermix):

1. More or less anarchy when it comes to leadership of Squeak-dev. The
board is not taking charge, for a whole range of possible reasons. One
obvious reason is that the guys elected are really good developers who
tend to be very busy. Another reason may be that we have different
expectations on what they should/could be doing. IMPORTANT: I am *not*
placing any blame though - we are all in this together.

2. We are now "de facto" living in a highly forked world (Croquet,
Sophie, Squeak-dev, Spoon, Squeakland, OLPC, yaddayadda...) but we don't
have tools that support such a world!

3. The core Squeak is not evolving properly mainly due to the fact that
most of the core codebase is "abandoned" - or in other words, noone
feels responsible and/or authoritive to bring it forward.

In short we have "leadership paralysis", a "forked world" and a largely
"abandoned core". Other problems I missed? Of course! ...but lets ignore
those for this post.

Considering a few measures then:

- Make a new fork with a strong leadership? Hmmm, could help with #1 and
#3 above but does nothing to improve #2 and would probably be tons of
work to succeed, and I personally don't have the required time to spend.
Plus I am a hard core "squeak-dev" member - I don't *want* to fork. :)

- Reshape the organisation of Squeak-dev? Nah, I am sick and tired of
pulling such work, done just too much of that stuff already over the
years, and it would possibly help with #1 and #3 (strong leadership) but
again would not help at all with #2.

So ok, forking is out and reshaping the organisation from the inside is
out, even though I really would want much more initiative from the board
in the future - no doubt about that.

Hmmm. But what could be done then? Well, after seeing over and over
again how really good TOOLS can change the way we interact in our
community (Monticello, SqueakMap, SqueakSource, Universes etc) I tried
to imagine a new infrastructure that would help with the three problems
above, in a *natural* way.

Let's look at #1 - the anarchy problem. If we don't try to solve it but
instead try to *live with it* - what could that mean? The piece that
suffers mostly from the anarchy is of course the base image. All our
external packages prosper just fine anyway (kinda). So could we put some
kind of "supertool" in place to maintain the base image that could be
made to work *without* strong leadership? I think we can.

What about #2 - the forking problem? Well, imagine this "supertool"
being written in such a way that it can be used in all major forks - if
its good and easy to adapt/install it would most probably also *be*
used. Monticello has already showed this can be done.

So instead of trying *not* to fork, we try to make it very easy to fork
- or rather *easy to live* in a forked world of Squeak
images/communities.

But #3 - the rotting core problem? Again, if it was very easy to fix
stuff in the core, publish these changes without having to ask anyone
for permission and very easy to cherry-pick such fixes from other people
- then hopefully the core would again start moving forward. As Andreas
notes - fixes are sitting inside Croquet and my bet is that they are
mainly sitting there because it is a bit "too much work" to push them
out. I am aware of fixes sitting in Gjallar (at least a few) just
because I didn't feel I had time to fix/prepare them sufficiently etc.

So my "daft" idea is:

Let's pull together a good team and make a Supertool that is written to
be used in all forks that... well, what will it actually *do*? :)

1. Introduce a new kind of source unit - let's call it a Delta. The guys
working with MC2/Monticello may have lots of noteworthy stuff to share
about how this kind of thing could look, but I simply want a "changeset"
for the 21st century. A patch. But a smarter one! The philosophy here
being that changesets are nice given their simplicity and malleability -
but they are not so nice in many other respects. But the concept of
communicating with other developers using the natural "work unit" - a
"commit" - is quite nice. In MC at least I tend to make too large
snapshots because let's face it - MC is too slow for small commits -
given how it works. We just don't suffer because it is so darn slick at
merging :)

2. Create the notion of "delta streams". Yes, we had the update stream
earlier - a sequential flow of changesets. It was kinda nifty, but
imagine having tons of these streams originating both from larger
cooperative projects (Croquet, Sophie, Seaside, Gjallar etc) and from
individual developers ("Andreas Raab's kernel fixes", "Juan's
refactorings of Morphic" etc). A Delta could appear in several streams
so Croquet could have a single stream for each release AND multiple
streams for different kinds of Deltas, like "base image fixes for
Hedgehog" etc).

3. Make an efficient but simple storage model for deltas and delta
streams. Here I am thinking KISS. If a delta could be represented as a
single gzipped file (enabling dumb servers just like MC does) and a
stream can be represented by file naming conventions (by a number
suffix) - then a stream is just a directory or a zip of a bunch of delta
files. Yes, very similar to gzipped changesets and the update stream. :)
This enables us to use all "vanilla" tools available for dealing with
files (http servers, ftp, rsync, email etc).

4. Make a simple but efficient transport. Let's say we set up a public
server that syncs such directories of delta files (streams) from tons of
places. And then offers it to all of us as a simple rsync. Each
developer (or team) could then use rsync to mirror that mega tree of
delta files onto our laptops/servers and the tool inside Squeak would
just need to bother with reading the local filesystem - which makes it
much more portable across Squeak dialects (and possibly even Smalltalk
dialects - but let's not go there just yet). This also means we don't
suffer from servers being down.

4. Make a simple model of Delta and DeltaStream in Squeak and let it
mirror the filesystem. Add a simple API to the model á la Keith's
Installer. Or hey, just enable Installer work with this model.

5. Make a great Morphic tool to work with them. Perhaps we could even
rework the changesorter family of tools thus superceding changesets?

Ok, imagine this supertool in your image. Imagine that all public
streams are listed directly in the tool and new ones just "pop up" as
they are added. Imagine that you can add your own private streams just
like you can add repositories to sources.list in apt-get (you Debian
people know what I mean). You can have purely private local streams on
your harddrive or streams on a shared fileserver in your company/dev
group, etc.

Now you have this huge flood of Deltas in hundreds of streams at your
fingertips. How would you use it?

1. Subscribe to some major streams and configure it to automatically
load deltas whenever they appear when you press "update".

2. Set some rules that govern if the tool should just try to load - or
ask for confirmation based on characteristics of the delta.

3. Autosubscribe to categories of streams. For example, if a new bug-fix
stream appears for a package you use a lot - you might want to get it
autosusbcribed - or again, have the tool ask you.

4. Of course, easily publish your own streams. We are talking single
button, no getting permissions etc.

5. Push fixes to other people's packages as deltas onto a personal
public fixes-stream. This means it does not matter what package you are
bug fixing - you can *always* push the fix to your personal public
fixes-stream and don't need to bother with getting permissions on
SqueakSource or wherever it came from. This is a very IMPORTANT feature
and should hopefully put an end to "lost fixes" or "fixes sitting inside
images".

We have too many fixes out there that just never get published because
of the "hassle", and sure, someone says "it is very easy to upload to
Mantis!" - but fixes on Mantis sets expectations on quality,
documentation, follow up etc. A personal fixes stream like above sets no
such expectations - it is there for the taking - but that is all.

6. Pull deltas. Selective cherry picking etc.

Important features of Deltas:

- Atomic load of a Delta. Either it loads cleanly or it does not load at
all, and it ensures this by checking FIRST that everything is in place
for it to be able to apply cleanly. This should prevent "failed loads"
and broken images. The actual low level atomicity is another story but
SystemEditor from MC2 perhaps? I dunno.

- Revertable, if *possible*. A delta can be analyzed to see if it is
revertable. If it includes doits or class initializations it is in
theory not revertable, it may be in practice though! It can also be
marked as DefinitelyNonRevertable.

- A Delta is declarative. Another class should be responsible for
actually applying them.

- When a Delta is applied to an image we generate a reverse Delta which
(when applied) will reverse the effects. Since the image may be in
different states this reverse Delta needs to be constructed when the
Delta is applied!

A Delta contains:

- A Preamble similar to ChangeSets with info fields
	- Developer (name, id, signature etc whatever)
	- Original stream (URL)
	- UUID
	- DefinitelyNotRevertable flag
	- Test doit (a non destructive, non interactive doit that should throw
an exception in order to prevent loading!)
- An ordered sequence of Actions, see below.

An example list of different types of Actions are:

- Change method (class name, old method src, new method src)
- Add method (class name, method src, category name)
- Remove method (class name, method src, category name)
- Categorize method (class name, method name, old category name, new
category name)

- Create class (class name, super class name, definition, category)
- Delete class (class name)
- Change superclass of class (class name, old super class name, new
super class name)
- Rename class (old class name, new class name)
- Categorize class (class name, old category name, new category name)
- Change definition (class name, new definition, old definition)
- Class comment change (new comment, old comment)

- Class initialization
- Doit (marked as revertable or not by author!)

Note how these Actions contain "more" info than a ChangeSet - it
contains information about what it was "before" in the image it
originates from! The idea is to make Deltas "rich" with info so that we
can apply them with more smarts.

When applied to an image the Delta is copied to a directory with the
same name as the image followed by "-applied-deltas". It is also logged
in changesfile. If the Delta can not be cleanly reverted (based solely
on its own contents) we generate reverse Deltas and store them in the
same directory prefixed with "reverse-".

Applying a Delta:

1. Verify that it can be applied (or signal CanNotApplyException). Do
this by analyzing Actions and running any Test and see if it throws an
Exception.
	- It can not be applied *cleanly* if something is "different" from
expected.
	- It can be applied *dirty* if all changes can be applied in "some
fashion". For example, a delete operation and the thing to delete is
already deleted. The reverse Delta will then show this so that a revert
will not put the thing back in.
	- It can be applied *partially* if just a subset of Actions are applied
(by choice or since some just can't be applied). The reverse Delta will
then show what was applied or not.

2. Verify that it can be reverted cleanly (or signal
CanNotRevertCleanlyException). Do this by analyzing Actions.

3. Apply and generate the reverse Delta if needed.

4. Copy delta and any reverse delta to applied dir.

5. Log to changes file.

6. Report package touched (for running unit tests afterward!)

What about merging then? Well, the concept here is to be much more
"loose" when it comes to merging compared to say MC. It was inspired by
something I read about git (Linus' SCM tool) - it doesn't try to be
smart - it just tries to do the right thing when it is obvious. In other
situations it just asks the developer. I think I like this approach. The
Delta model should be easy enough for all developers to understand. It
is "just" a changeset/patch after all.

So if I am trying to apply a Delta (or several) and the "before state"
is not as the Delta expects - we can either cop out, or let it be
"smart", or just ask the user to decide what to do. BUT... I am not an
SCM implementor - the MC/MC2 guys can clearly explain to me why/how this
will not work or amend the idea so that it can work. I gladly admit my
ignorance. ;)

...ok, enough blabbering. :) Does this sound plausible, useful or just
plain dumb? Is MC2 already this and much more? 

ciao, Göran