[squeak-dev] Deltastreams update

Göran Krampe goran at krampe.se
Thu Mar 12 10:28:58 UTC 2009


Hi folks!

Wanted to give interested people (if there are any!) a heads up on 
Deltastreams development.

History
=======

I started the Deltastreams "project" a few years back, there is quite a 
bit of info on the wiki and even a movie from OOPSLA where I present it 
and demo it. The idea of Deltas and streams of them is an evolution of 
the old changeset and update stream. The concept and idea also borrows 
from experience in using MC (which is not a direct competitor) and other 
distributed SCMs outside of Squeak. Think of a Delta as a "super 
changeset". The streams part has not been coded on yet.

Work
====
After a while Matthew Fulmer started helping me with the code and he has 
done a LOT on the code base including lots and lots more tests, lots of 
fixes in SystemEditor (from Colin Putney, used in MC2 I think) which we 
depend on. In fact, the Deltastreams codebase was probably first out in 
stressing SystemEditor. Matthew also created ICS - an advanced file 
format for Deltas. Matthew has lately been working in MC a lot, which 
gives Matthew a unique perspective that I don't have. Matthew is also 
involved a lot in Croquet - which is one primary potential fork to use 
DS with.

I have started working the last days again and the "itch" is back for 
real. :) I am focusing on the "replace changesets" part and the next 
step for me is probably to make a "dual change sorter"-like UI and a new 
file format (see below). And make tests green. And also make ICS format 
work. :)


Code today
==========
Deltastreams is hosted on SS. I currently develop it in 3.10.2, 
dependencies are SystemEditor and InterleavedChangeset (ICS). Both of 
them could be replaced with other packages taking those roles ("file 
format for Deltas" and "tool to atomically apply code changes to a live 
image"). We want the code to have very little dependencies and to work 
in "all" Squeaks.

Status
======
We have lots of broken tests right now, and I intend to make it all 
GREEN and keep it that way. We have been sloppy and have added lots of 
tests without implementing them - this tactic works for a while but when 
the code base gets complex they really need to be GREEN. Otherwise you 
lose the ability to see if you actually broke something :).

The good part is that there are about 420 tests, and lots of aspects of 
Deltas are thoroughly covered. Logging, applying and reverting Deltas 
(code mechanisms) are 99% working. Currently I think the only bit 
missing is category reorganization.

The ICS file format is partially working, I haven't gotten into the code 
base fully yet - the format is very "clever" which may be its main 
problem. It tries to do a really cool trick - being compatible with 
Changesets! Or in other words, the same file contains both a binary 
representation of a Delta that Deltastream code uses AND a changeset 
representation that old images can use. This means that an ICS file can 
be filed into an old image without ANY modification to that image. It 
then simply looks like a changeset.

There is a UI built by Matthew that works on SystemEditor "models", I 
know too little of its status right now. I intend to build another 
complementary UI working much more like the "dual change sorter".


A new format
============

ICS is cool. :) But... sorry Matthew, I think I will spend some time on 
another format for Deltas too. One that is NOT backwards compatible in 
that way. This is an area I really want some feedback on! Both on making 
another format available and what that format would be. :)

I would like this "native" Delta format to be:

- Human readable, just like a cs. We just gzip them and make up some 
nice extension like .dz or something. :)
- Editable in a text editor. This means it can not be too complex.
- Easy to extend. This means the base syntax should leave room for new 
elements and "relaxed parsing" that can ignore unknown elements
- Very easy to parse. This means it needs to be simple, simple, simple. 
I don't want to depend on YAXO or similarly large package for parsing.
- Not "compiler driven". I want the format to be safe and fast to load. 
This means the regular Smalltalk Compiler is out of the picture.

My current idea of a format that I think covers the above is:

JSON

...possibly using netstrings for source code (thus not strictly JSON).

JSON offers a very readable "XML-ish" generic format that is very easy 
to parse and produce. It can be easily edited in a text editor if 
needed. It is compact. If used correctly it should be easy to extend.

One substantial part of the file will be Smalltalk source code. I am not 
keen on having to do character-by-character escaping to comply with JSON 
Strings though... thus - netstrings. A netstring is a trivial construct: 
<length-in-ascii> ":" <binary-data> ","

For example:

11:Sentence of thirty characters.,

Which then would be used for the source code. Advantages would be not 
having to do character-by-character escaping. Is this worth "breaking" 
JSON? Hmmm, thinking more about it I think we need to "break it" anyway, 
because a JSON String can't contain a CR. :)

Ok, sorry for the long post.

regards, Göran




More information about the Squeak-dev mailing list