DeltaStreams file-out format and class model

Wed Oct 10 05:15:51 UTC 2007

Hello Matthew,
> This is an IRC discussion I am moving to the mailing list
>
> the current DeltaStreams file-out format is monolithic, and can be only loaded/saved as one chunk. It is a gzipped DataStream of the Delta model.
This is essentially the same idea as Monticello. Classes modelling each type of change albeit at a higher level of granularity than DS, saved/loaded as a dataStream.

When loading, Monticello takes the monolithic file out of the model and does some analysis to establish safest load order and stuff that is not needed. Monticello is quite smart since it only loads what has changed.

Datastream based formats like this are inherently inflexible. If you want to use this format due to its simplicity and universal availability then you either need to provide for future expansion. Or have some notion of format version. Supporting all possible versions could then be difficult if this depends upon class definitions, since it is presently difficult to have more than one definition of a class in the image at the same time.

I think that the easiest way of providing for future expansion is to have a slot reserved for a 'properties' dictionary in the base class DSChange. This is the approach I have taken for releasing Monticello from its fixed class layout, except of course it is somewhat more difficult to add after the fact. Additional instvar requirements can be placed in there rather than changing the class format.

Another possibility would be to simply load all data elements into a Dictionary and file that out instead. 
> An idea is to base it on a logging framework. where composite
My logging framework is designed to be the coders-interface between the placing of a debugging statement in their code  , and the choice of back-end log to disk framework, of which the squeak-dev universe offers 3 different ones.

So as it stands I dont think my logging stiff is the right tool for this, but a variant could be. There is no reason by actual useful bits of data/code cannot be sent to logs. e.g. In perl it is common practice to use Data Dumper to write out complete data structures to server logs. The form it is writen in can be eval-ed to restore the data structure.
> changes could be rendered as:
>
> open composite change.
> add change.
> add change.
> close composite change.
>
> Much like xml/sexp formats. Not sure if the chunk format could
> do this
>
> This would enable streaming loading and saving of deltas, rather
> than the all-at-once load/save as is done now

> Keith Hodges replied:
>   
>> please please use the chunk format
>>     
Some people love it some people hate it.

I love it, it is ultimately flexible, thats what I like about it. The
simplicity to power ratio is potentially very good.

If I recall correctly the default behaviour begins such that the first
chunk is read and evaluated by the compiler, the result being a reader
which (by convention) reads the next chunk and so on. Typically when a
reader finds an empty chunk it returns,  resetting to the initial reader
which restarts the process.

Those who want to make models out of the data records don't like it
because it is flexible enough to include anything, so the content cannot
necessarily be guaranteed to be readable by anything other than the
Compiler.

Given the flexibility of the chunk reading idea, I am surprised that we
have not seen much innovation around it in improving fileOuts etc.

One advantage being that chunks can do anything, so you could record and
file out an executable representaion of any action even such things as
"pasting an image into the environment" since chunks can include encoded
binary data if preceded by the appropriate decoding reader.
>> I myself didnt think that DS needed a class model
>> just heve defined operations on a System Editor
>> foled out in chunk format
>> filed*
>> i.e.,,, change method
>> chunk of code which assigns the method change to aystemEditor A 
>> and a second chunk of code which assigns the inverse to system editor B
>> thats you delta stream
>>     
>
> First, I am open to any suggestion about a better change model.
> Having 34 subclasses of DSChange be the model does seem messy to
> me.  I know there should be something better, but I havn't
>   
I guess that is an inevitable outcme of modelling in an environment
where  one models with classes and instances.
> thought of it yet, except that it would probably be vaguely
>   
> pier-like
>   
I am not sure I understand that statement.
> Some questions:
> What is a "defined operation on a SystemEditor" other than a class model?
>   
It is simply generated source code which performs an operation, In this
case the receiver is a model of the Smalltalk environment.

So what DS models as "Add an instance var 'newVar' to Class MyClass" can
be persisted as.

(CurrentSystemEditor value at: #MyClass) addInstVarName: 'newVar'.

One problem with chunks is that remapping Globals is not straight forward.
However, I am a fan of ProcessSpecific variables and I think that they
could help in this, since
"CurrentSystemEditor value" would be determined at runtime, and could
have a different value
in each process that is using it.
> What do you mean by "the chunk format" or "like the chunk format"?
> Do you mean "able to be used as a CompiledMethod sourcePointer"?
>   
see above.
> Or do you mean "some format that is a smalltalk expression that generates something"?
>
>   
I do, but the chunk format is not limited to that.
> I don't really understand the chunk format; by having old and
> new version, any fileout of Deltas would not look like a
> file-out or change set, even if it did use something parsable by
> the chunk reader.  The chunk format also has the
>   
indeed it would look like a deltastream.
> complication/liberty of custom stream parsers
>
> Would this be like the chunk format you speak of?
> | editor classEditor|
> editor := SystemEditor new !
> classEditor := editor at: Object !
> classEditor compile: "methodZ ^ self" classified: #'junk methods'
I imagine you would need...

A header chunk, to set up the SystemEditors, one for the forward
direction one for the reverse (although I suspect it may be possible to
have one do both.)

! CurrentSystemEditor value: SystemEditor new. ! !

Action Chunks.

! CurrentSystemEditor value addInstVarName: 'a' ! ! "individual statements"
! CurrentSystemEditor value inverseEditor removeInstVarName: 'a' ! !

Although many may not agree with me I think there is a lot of potential
for innovation using the chunk format, and it has the advantage that
most people have the tools to read it already.

regards

Keith