DeltaStreams file-out format and class model

Wed Oct 10 07:11:49 UTC 2007

On Wed, Oct 10, 2007 at 06:15:51AM +0100, Keith Hodges wrote:
> > the current DeltaStreams file-out format is monolithic, and
> > can be only loaded/saved as one chunk. It is a gzipped
> > DataStream of the Delta model.
> 
> Datastream based formats like this are inherently inflexible.
> If you want to use this format due to its simplicity and
> universal availability then you either need to provide for
> future expansion. Or have some notion of format version.
> Supporting all possible versions could then be difficult if
> this depends upon class definitions, since it is presently
> difficult to have more than one definition of a class in the
> image at the same time.

Indeed it is quite inflexible. I plan to change it.

> I think that the easiest way of providing for future expansion
> is to have a slot reserved for a 'properties' dictionary in
> the base class DSChange. This is the approach I have taken for
> releasing Monticello from its fixed class layout, except of
> course it is somewhat more difficult to add after the fact.
> Additional instvar requirements can be placed in there rather
> than changing the class format.

DSDelta already has this, however, DSChange does not. DSChange
should have this

> Another possibility would be to simply load all data elements
> into a Dictionary and file that out instead. 

Could you elaborate?

> > An idea is to base it on a logging framework. where
> > composite
> My logging framework is designed to be the coders-interface
> between the placing of a debugging statement in their code  ,
> and the choice of back-end log to disk framework, of which the
> squeak-dev universe offers 3 different ones.
> 
> So as it stands I dont think my logging stiff is the right
> tool for this, but a variant could be. There is no reason by
> actual useful bits of data/code cannot be sent to logs. e.g.
> In perl it is common practice to use Data Dumper to write out
> complete data structures to server logs. The form it is writen
> in can be eval-ed to restore the data structure.
> > changes could be rendered as:
> >
> > open composite change.  add change.  add change.  close
> > composite change.
> >
> > Much like xml/sexp formats. Not sure if the chunk format
> > could do this
> >
> > This would enable streaming loading and saving of deltas,
> > rather than the all-at-once load/save as is done now
> 
> > Keith Hodges replied:
> >   
> >> please please use the chunk format
> >>     
> Some people love it some people hate it.
> 
> I love it, it is ultimately flexible, thats what I like about
> it. The simplicity to power ratio is potentially very good.
> 
> If I recall correctly the default behaviour begins such that
> the first chunk is read and evaluated by the compiler, the
> result being a reader which (by convention) reads the next
> chunk and so on. Typically when a reader finds an empty chunk
> it returns,  resetting to the initial reader which restarts
> the process.

My understanding of the chunk format is from
http://wiki.squeak.org/squeak/1105

> Those who want to make models out of the data records don't
> like it because it is flexible enough to include anything, so
> the content cannot necessarily be guaranteed to be readable by
> anything other than the Compiler.
> 
> Given the flexibility of the chunk reading idea, I am
> surprised that we have not seen much innovation around it in
> improving fileOuts etc.
> 
> One advantage being that chunks can do anything, so you could
> record and file out an executable representaion of any action
> even such things as "pasting an image into the environment"
> since chunks can include encoded binary data if preceded by
> the appropriate decoding reader.

Indeed. I did this in my second attempt at a simple Delta
fileout format. I put a static decoder chunk at the beginning of
the file, followed by the gzipped datastream. I dropped it when
I noticed that the chunk reader was just noice to the actual
content, which was in the gzipped datastream.

My first attempt was an expression that, when evaluated, yielded
the Delta. I found out that the compiler and image die very
ungracefully when asked to evaluate a 3000-line long
expression/statement.

Here are some things I don't see how to do with the chunk
format:

1. Read chunks in reverse order. This is absolutely essential
   when reverting a delta.
2. Pass arguments to a chunk file. For example, how could I
   ensure that the Compiler, while parsing a chunk file, sends
   all commands through a certain visitor, depending on whether
   I want to 
   1. Find all conflicting deltas
   2. apply non-conflicting deltas
   3. collect all deltas of one package and do something with
      them
3. define a chunk heiarchy, such as one chunk containing and
   being able to manipulate an delimited set of the next several
   chunks. This would be very useful in storing composite
   changes, and in delimiting the individual Deltas in a
   DeltaStream. This may be doable by returning a chunk reader
   from a reader chunk, but I don't know if there would be a way
   for a recursed chunk reader to recognize the end of its
   substream.

> > First, I am open to any suggestion about a better change
> > model.  Having 34 subclasses of DSChange be the model does
> > seem messy to me.  I know there should be something better,
> > but I havn't
> >   
> I guess that is an inevitable outcme of modelling in an
> environment where  one models with classes and instances.

Let's hope not!

> > thought of it yet, except that it would probably be vaguely
> >   
> > pier-like
> >   
> I am not sure I understand that statement.

DSChange and friends mix together operation (add remove
change move), context (class, method, class organization, system
organization), and subject (ivar, method source, comment,
timestamp, category, etc.) at the class definition level.
Examples:

DSMethodAdded (operation: add; context: aClass; subject: aMethod)
DSMethodRemoved (operation: remove; context: aClass; subject: aMethod)
DSMethodSourceChange (operation: change; context: aMethod; subject: source, timestamp)

On the other hand, Pier separates these concepts a bit.
Operations are the task of a few very generic PRCommands
(PRAddCommand, PREditCommand, PRRemoveCommand, PRMoveCommand).

Commands operate, as I understand it, on PRStructures, which are
both a Context (a PRPath, which looks up a context), and a
subject. I am vague on the details, but it is a praiseworthy
model, since it recieves a lot of praise :).

> > Some questions: What is a "defined operation on a
> > SystemEditor" other than a class model?
> >   
> It is simply generated source code which performs an
> operation, In this case the receiver is a model of the
> Smalltalk environment.
> 
> So what DS models as "Add an instance var 'newVar' to Class
> MyClass" can be persisted as.
> 
> (CurrentSystemEditor value at: #MyClass) addInstVarName:
> 'newVar'.
> 
> One problem with chunks is that remapping Globals is not
> straight forward.  However, I am a fan of ProcessSpecific
> variables and I think that they could help in this, since
> "CurrentSystemEditor value" would be determined at runtime,
> and could have a different value in each process that is using
> it.

I don't know what you mean by "remapping Globals"

> > Would this be like the chunk format you speak of? 
> > | editor classEditor|
> > editor := SystemEditor new !
> > classEditor := editor at: Object !  classEditor compile: "methodZ ^ self" classified: #'junk methods'
> I imagine you would need...
> 
> A header chunk, to set up the SystemEditors, one for the
> forward direction one for the reverse (although I suspect it
> may be possible to have one do both.)
> 
> ! CurrentSystemEditor value: SystemEditor new. ! !
> 
> Action Chunks.
> 
> ! CurrentSystemEditor value addInstVarName: 'a' ! !
> "individual statements" ! CurrentSystemEditor value
> inverseEditor removeInstVarName: 'a' ! !
> 
> Although many may not agree with me I think there is a lot of
> potential for innovation using the chunk format, and it has
> the advantage that most people have the tools to read it
> already.

All applications using the chunk format, so far, have ran into
the problem that if you need to do something not done by the
code in the chunk stream, you need to abandon the chunk format
and resort to manually parsing it to get back to the objects you
started with, or something more manipulable. For instance, one
can apply change sets using the built-in chunk reader, but
to open a change list or change browser, one must heuristically
parse the file (see ChangeList protocol scanning, for instance)

This may be a limit of the file-out format, though, and not of
the underlying chunk format.

A declarative model and matched visitor is currently the kernel
of DeltaStreams, so I don't see the chunk format as working for
the current model. If the model were more like the pier model, I
think the chunk format may be better suited, as the pier model
is not so declarative imho

-- 
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808