Current project serialization format?

List overview All Threads
Download

newer

older

http://squeakland.org/ - down

Fwd: XO 1.5 software release...

Andreas Raab

1 Dec 2009 1 Dec '09

7:59 p.m.

Quick question: What is the current project serialization format used in Etoys? Historically, image segments were used but I know that Yoshiki looked at alternatives and I don't know what the current state of serialization in Etoys is. Specifically I am interested in: - Does an Etoys project file contain any sort of Manifest that states version dependencies? (i.e., says which version of Etoys it was created with) - Is the current Etoys project serialization format a low-level format (like ReferenceStream, ImageSegments etc) or a high-level format (more abstract description; not directly serializing bits of the underlying object) - Assuming it isn't image segment based how are scripts and script references implemented? Same as previously (file out for scripts, class vars or globals refs for references) or differently?

Thanks for any info. I haven't looked at project serialization in many moons so I'm definitely not up to speed here.

Cheers, - Andres

Show replies by date

Yoshiki Ohshima

2 Dec 2 Dec

4:31 a.m.

Andreas,

At Tue, 01 Dec 2009 10:59:53 -0800, Andreas Raab wrote:

...

Quick question: What is the current project serialization format used in Etoys? Historically, image segments were used but I know that Yoshiki looked at alternatives and I don't know what the current state of serialization in Etoys is.

For regular projects, we still use good old ImageSegments. The alternative is an S-expression representation (dubbed "SISS", homage to SIXX) that is used for the QuickGuides contents.

The reason for using SISS for QuickGuides is that ImageSegment is faster for bigger projects than SISS, but for smaller projects SISS is faster and (tends to be) smaller.

Also with SISS, there is somewhat better control over the Player classes (it knows which classes are from it), so loading and unloading a Guide book is easy.

...

Specifically I am interested in:

Does an Etoys project file contain any sort of Manifest that states

version dependencies? (i.e., says which version of Etoys it was created with)

Yes. There is a file called manifest in the zip and it looks like:

Squeak-Version: etoys4.0 Squeak-LatestUpdate: 2325 File-Name-Encoding: utf-8 Project-Language: en URI: http://squeakland.org/etoys/yoshiki-3431931444 user: yoshiki Project-Format: ImageSegment projectkeywords: fractal, chaos projectcategory: 553 projectname: Mandelbrot projectdescription: Mandelbrot set drawing projectauthor:

...

Is the current Etoys project serialization format a low-level format

(like ReferenceStream, ImageSegments etc) or a high-level format (more abstract description; not directly serializing bits of the underlying object)

Since the default is ImageSegment, the answer is "a low-level". Even SISS is (more or less) in text, the field description is still tied to the actual instance variables of these objects by default. So, migrating objects to another system or adding skin to the UI is not entirely made easy (though possible). One good thing about the SISS version is that the "identical" project usually produces exact the same result; so tracking the change was easier.

There was a bit of attempt to provide a higher-level format in SISS. It involved like when there is no TransformationMorph, its renderedMorph's bounds is in global coordinates, but once TransformationMorph is added the it becomes in local. A higher level description should be uniformly dealing with one of them (probably local), but it wasn't really to a point where one can save and load a project. For Etoys, any UI object is potentially scriptable and reshapable(so far, people may want to restrict it), so it is another stumbling block for it.

For a completely different project VPRI are doing internally, I store UI objects in SISS in more higher level description of UI objects and already went through some major object shaping successfully.

...

Assuming it isn't image segment based how are scripts and script

references implemented? Same as previously (file out for scripts, class vars or globals refs for references) or differently?

In the ImageSegment, at least the references are made project local; the dictionary of player reference name to actual object is in the property of PasteUpMorph and the script compiler looks up objects from there. But otherwise it is similar to before.

In SISS, a script is saved as a "parse tree" out of send node, repeat node, variable node, etc. It roughly look like:

(script :name "script1" (send :selector "color:sees:" (variable "self) (Color "....") (Color "....")))

or such. The Player class names and internal names of Players are abstracted away.

...

Thanks for any info. I haven't looked at project serialization in many moons so I'm definitely not up to speed here.

Well, I regard that you are still the expert when it comes to project saving in various forms....

-- Yoshiki

Andreas Raab

6:26 a.m.

Yoshiki Ohshima wrote:

...

At Tue, 01 Dec 2009 10:59:53 -0800, Andreas Raab wrote:

...
Quick question: What is the current project serialization format used in Etoys? Historically, image segments were used but I know that Yoshiki looked at alternatives and I don't know what the current state of serialization in Etoys is.

For regular projects, we still use good old ImageSegments. The alternative is an S-expression representation (dubbed "SISS", homage to SIXX) that is used for the QuickGuides contents.

The reason for using SISS for QuickGuides is that ImageSegment is faster for bigger projects than SISS, but for smaller projects SISS is faster and (tends to be) smaller.

But robustness in the face of a changing environment wasn't the main driving force? This is a bit where I'm headed - I'm interested in updating projects with a storage mechanism that can work robustly in the face of a constantly changing environment. I had hoped that SISS might be a good starting point for that. BTW, is SISS explicitly key-value based or just a sequence of objects with meaning implied by sequence?

...

...
Specifically I am interested in:

Does an Etoys project file contain any sort of Manifest that states

version dependencies? (i.e., says which version of Etoys it was created with)

Yes. There is a file called manifest in the zip and it looks like:

Squeak-Version: etoys4.0 Squeak-LatestUpdate: 2325 File-Name-Encoding: utf-8 Project-Language: en URI: http://squeakland.org/etoys/yoshiki-3431931444 user: yoshiki Project-Format: ImageSegment projectkeywords: fractal, chaos projectcategory: 553 projectname: Mandelbrot projectdescription: Mandelbrot set drawing projectauthor:

Ah, very good. This is an excellent start. What I really want to do is to provide enough information so that a project loader can tell whether it will be able to load a project and if not, why, instead of our old friend "Reading an instance of GobblyGook. Which modern class should it translate to?" which is my second-favorite questions to ask users. (of course my absolute favorite is when project saving invites you to "stop and take a look" at some blocks it encountered - both are completely and utterly incomprehensible questions to users that it's pointless to ask them in the first place)

In any case, being able to have a common manifest that we can use to identify version and possibly other dependencies will be great.

...

There was a bit of attempt to provide a higher-level format in SISS. It involved like when there is no TransformationMorph, its renderedMorph's bounds is in global coordinates, but once TransformationMorph is added the it becomes in local. A higher level description should be uniformly dealing with one of them (probably local), but it wasn't really to a point where one can save and load a project. For Etoys, any UI object is potentially scriptable and reshapable(so far, people may want to restrict it), so it is another stumbling block for it.

It's not all that difficult; but my main interest here was really to find out what kind of work had already been done for defining serialized abstractions for common Object/Morph types (which is admittedly a daunting task when you start from scratch) but...

...

For a completely different project VPRI are doing internally, I store UI objects in SISS in more higher level description of UI objects and already went through some major object shaping successfully.

... this sounds as if most of that hasn't been done in Etoys.

...

...

Assuming it isn't image segment based how are scripts and script

references implemented? Same as previously (file out for scripts, class vars or globals refs for references) or differently?

In the ImageSegment, at least the references are made project local; the dictionary of player reference name to actual object is in the property of PasteUpMorph and the script compiler looks up objects from there. But otherwise it is similar to before.

How interesting! This is almost as if a project had a namespace associated with it, which is an idea that I might like to explore a little more.

...

In SISS, a script is saved as a "parse tree" out of send node, repeat node, variable node, etc. It roughly look like:

(script :name "script1" (send :selector "color:sees:" (variable "self) (Color "....") (Color "....")))

or such. The Player class names and internal names of Players are abstracted away.

Oh, but if references are looked up via the current project, why is this necessary for the players? Shouldn't they resolve unambigously? Also, why a parse tree instead of source code? Is there extra information that's not available via parsing source? Or is it to avoid a compiler dependency?

...

...
Thanks for any info. I haven't looked at project serialization in many moons so I'm definitely not up to speed here.

Well, I regard that you are still the expert when it comes to project saving in various forms....

This is all great info. I definitely have some digging to do in the current etoys image ;-)

Cheers, - Andreas

K. K. Subramaniam

8:44 a.m.

On Wednesday 02 December 2009 10:56:57 am Andreas Raab wrote:

...

Also, why a parse tree instead of source code? Is there extra information that's not available via parsing source? Or is it to avoid a compiler dependency?

Perhaps because scripts in Etoys are held as a sequence of tile morphs (visual rendering of parsed statements). The tiles can be rendered in source code form but cannot be reverted if the code is edited. One can also 'escape' into text mode to type in any arbitrary Squeak method.

Subbu

Yoshiki Ohshima

8:08 p.m.

At Tue, 01 Dec 2009 21:26:57 -0800, Andreas Raab wrote:

...

Yoshiki Ohshima wrote:

...
At Tue, 01 Dec 2009 10:59:53 -0800, Andreas Raab wrote:

...
Quick question: What is the current project serialization format used in Etoys? Historically, image segments were used but I know that Yoshiki looked at alternatives and I don't know what the current state of serialization in Etoys is.

For regular projects, we still use good old ImageSegments. The alternative is an S-expression representation (dubbed "SISS", homage to SIXX) that is used for the QuickGuides contents.

The reason for using SISS for QuickGuides is that ImageSegment is faster for bigger projects than SISS, but for smaller projects SISS is faster and (tends to be) smaller.

But robustness in the face of a changing environment wasn't the main driving force?

Sure. One possibility was to use a version of such things as a vehicle ot migrate the implementation to somewhere else. But as you figured out, no good definition of such robust version of definition is designed yet.

...

This is a bit where I'm headed - I'm interested in updating projects with a storage mechanism that can work robustly in the face of a constantly changing environment. I had hoped that SISS might be a good starting point for that. BTW, is SISS explicitly key-value based or just a sequence of objects with meaning implied by sequence?

I "think" that the sequence of these attributes nor the sequence of sub expressions doesn't matter, except that the parse tree for a script. (But it is not thoroughly tested in that regard).

The idea is to make it isomorphic to XML, but just happened to rendered with round parenthesis.

...

...
...
Specifically I am interested in:

Does an Etoys project file contain any sort of Manifest that states

version dependencies? (i.e., says which version of Etoys it was created with)

Yes. There is a file called manifest in the zip and it looks like:

Squeak-Version: etoys4.0 Squeak-LatestUpdate: 2325 File-Name-Encoding: utf-8 Project-Language: en URI: http://squeakland.org/etoys/yoshiki-3431931444 user: yoshiki Project-Format: ImageSegment projectkeywords: fractal, chaos projectcategory: 553 projectname: Mandelbrot projectdescription: Mandelbrot set drawing projectauthor:

Ah, very good. This is an excellent start. What I really want to do is to provide enough information so that a project loader can tell whether it will be able to load a project and if not, why, instead of our old friend "Reading an instance of GobblyGook. Which modern class should it translate to?" which is my second-favorite questions to ask users. (of course my absolute favorite is when project saving invites you to "stop and take a look" at some blocks it encountered - both are completely and utterly incomprehensible questions to users that it's pointless to ask them in the first place)

Hehe, I agree.

...

In any case, being able to have a common manifest that we can uese to identify version and possibly other dependencies will be great.n

...
There was a bit of attempt to provide a higher-level format in SISS. It involved like when there is no TransformationMorph, its renderedMorph's bounds is in global coordinates, but once TransformationMorph is added the it becomes in local. A higher level description should be uniformly dealing with one of them (probably local), but it wasn't really to a point where one can save and load a project. For Etoys, any UI object is potentially scriptable and reshapable(so far, people may want to restrict it), so it is another stumbling block for it.

It's not all that difficult; but my main interest here was really to find out what kind of work had already been done for defining serialized abstractions for common Object/Morph types (which is admittedly a daunting task when you start from scratch) but...

...
For a completely different project VPRI are doing internally, I store UI objects in SISS in more higher level description of UI objects and already went through some major object shaping successfully.

... this sounds as if most of that hasn't been done in Etoys.

That is right.

...

...
In SISS, a script is saved as a "parse tree" out of send node, repeat node, variable node, etc. It roughly look like:

(script :name "script1" (send :selector "color:sees:" (variable "self) (Color "....") (Color "....")))

or such. The Player class names and internal names of Players are abstracted away.

Oh, but if references are looked up via the current project, why is this necessary for the players? Shouldn't they resolve unambigously? Also, why a parse tree instead of source code? Is there extra information that's not available via parsing source? Or is it to avoid a compiler dependency?

I don't quite get the first question (as I didn't explain it clearly). One thing here (I think) is that the Player class name is still global and say loading the same project twice and then exporting them would results in different output because of it. Another is that when loading a QuickGuide content into a user project, there may be name collision.

For using the parse tree, it is partly because to avoid a compiler dependency (for bringing the code into a different language) and avoiding parsing overhead.

From and to the S-expression representation of code, it is possible to go both ways from tiles and textual code. So, it is possible to go from an "arbitrarily edited" (with some side conditions) textual script to a tile script. And keep the possibility of switching to different language, keeping a method as a tree structure makes sense.

...

...
...
Thanks for any info. I haven't looked at project serialization in many moons so I'm definitely not up to speed here.

Well, I regard that you are still the expert when it comes to project saving in various forms....

This is all great info. I definitely have some digging to do in the current etoys image ;-)

Don't get disgusted, please ^^;

-- Yoshiki

Ted Kaehler

9:13 p.m.

Andreas, Yoshiki, As the author of the current ImageSegment-based project storing (with Dan Ingalls), and as the author of the two infamous error messages when loading a old project, I need to point out a few things.

...

What I really want to do is to provide enough information so that a project loader can tell whether it will be able to load a project and if not, why, instead of our old friend "Reading an instance of GobblyGook. Which modern class should it translate to?" which is my second-favorite questions to ask users. (of course my absolute favorite is when project saving invites you to "stop and take a look" at some blocks it encountered - both are completely and utterly incomprehensible questions to users that it's pointless to ask them in the first place)

(I certainly hope that those two infamous error messages are not my legacy to history... )

The only important questions is, "Would you rather have an error message when you are trying to save a project, or when you are trying to load it?" The frustration and anger of not being able to save the work that you just did is unspeakable. On the other hand, not being able to load an ancient project just means that you need to go get a few definitions before you can proceed. There is no comparison of the urgency of saving vs loading.

Fixed-format systems like SISS, and many XML-based systems, cannot cover everything. You can always create some data structure or code in your project that cannot be written out. There is no universal format. A universal format is not possible. The only thing we can do is provide enough levels of indirection so that anything can be expressed in a saved project. (Maybe a universal format is possible, and maybe we should invent it.)

I am glad that I sit next to Yoshiki, because when I can't write out my window in our new LObject world, I just turn to Yoshiki and ask him to fix SISS. I highly recommend this way of working to all future project creators. A bit impractical, though.

The current ImageSegement-with-address-space project saving has a much wider range of what it can save. It can save any data structure, period. It keeps the names of instance variables in the entire tree of superclasses. A project can survive a huge set of changes in instance variables in classes.

Yes, there is a problem with rare "naughty blocks" (a highly technical term of art), but I will fix that if there is demand. Please note that SISS can't store ANY blocks at all.

One goal of SISS-style formats is that a human can read it. In an emergency, a person can look in the file and see what objects are there. Having readable text in the file also helps other people write translators to other systems. Hypercard's file format is ascii and is human-readable. Bill Atkinson did this deliberately, and used it to debug code and repair stacks by hand. I watched him do this, and my conclusion is that it was a total waste of time! Every minute he spent poking around in the innards of a stack was a waste. It is much better to make a binary format that works completely, and never look at it again. I followed this technique with the ImageSegement based saving, and it has worked well.

If we insist on a text-based file format, parsing will play a vital role. I suggest that we get Alex Warth involved.

Once again, let me ask, "Would you rather have an error message when you are trying to save a project, or when you are trying to load it?"

--Ted.

-- Ted Kaehler http://tedkaehler.weather-dimensions.com/us/ted/index.html (home) 3261 Montecito Drive, Las Vegas, NV 89120. voice (702) 456-7930 Q: How do you warm up a room that has just been painted? A: Give it a second coat. (from National Geographic Kids, via my nine year old son) Q: Why is a dog like a man? A: Because he wears a coat and pants. (the favorite joke of my grandfather's friend, except he got mixed up and said trousers instead of pants, which made it even funnier.)

Yoshiki Ohshima

3 Dec 3 Dec

2:28 a.m.

Hi, Ted,

(If there is a subsequent message, we should take it off line as it is getting out of the etoys scope...)

It is trade-off and I don't think I can convince you to change your mind over night^^; Still, I'd think you agree that being able to "what it logically means", instead of "how it is represented" opens up more possibilities on one hand.

At Wed, 2 Dec 2009 12:13:59 -0800, Ted Kaehler wrote:

...

Fixed-format systems like SISS, and many XML-based systems, cannot cover everything. You can always create some data structure or code in your project that cannot be written out. There is no universal format. A universal format is not possible. The only thing we can do is provide enough levels of indirection so that anything can be expressed in a saved project. (Maybe a universal format is possible, and maybe we should invent it.)

SISS also can store all instance variables and all bits, and that is what we basically do for now.

Either in binary or text (it really doesn't matter; what matters is the abstraction level that means how much it depends on the physical representation of objects), and either the level of abstraction, the interpretation of these externalized bits is provided by outside environment, so yes, completely universal format that describes self contained semantics is not what we are after.

And also it is true for Etoys that such restriction has conflict with the idea itself. (Although with ImageSegment, some things are deemed outside of the project being saved and left out from the externalized file, like a script for viewer category and Thread navigation morph. Saving the image itself is an option for some cases.a different project, etc.,

Yes, so like you suggested (get a compatible image), or Andreas suggested some long time ago (download the compatible environment from the net and run the project in the environment), may be one way. But when people want, manifesting the logical meaning is cleaner, even with some restrictions.

...

Yes, there is a problem with rare "naughty blocks" (a highly technical term of art), but I will fix that if there is demand. Please note that SISS can't store ANY blocks at all.

Hehe, actually it can. What it cannot deal with is a block with free variables; i.e., if block is printed in text without losing information, it can be recreated upon loading.

...

One goal of SISS-style formats is that a human can read it. In an emergency, a person can look in the file and see what objects are there. Having readable text in the file also helps other people write translators to other systems. Hypercard's file format is ascii and is human-readable. Bill Atkinson did this deliberately, and used it to debug code and repair stacks by hand. I watched him do this, and my conclusion is that it was a total waste of time! Every minute he spent poking around in the innards of a stack was a waste. It is much better to make a binary format that works completely, and never look at it again. I followed this technique with the ImageSegement based saving, and it has worked well.

Another goal is to take "diff" between versions and see what has changed. I really haven't needing edit the generated result to fix something, but it certainly help debugging a thing completely works.

BTW, I'm not sure many types of errors we had for that internal projects are inherent to the way we did it.

...

Once again, let me ask, "Would you rather have an error message when you are trying to save a project, or when you are trying to load it?"

In either case, it may be still losing information when cropping out a part of the data structure, there is line drawn upfront and if the user is completely free to do anything, there is always possibility to cross the boundary.

So, we can make it so that it does not complain upon saving, but which is easier to gurantee whether loading will work is still debatable...

-- Yoshiki

Gerardo Richarte

2 Dec 2 Dec

8:51 p.m.

Yoshiki Ohshima wrote:

...

For regular projects, we still use good old ImageSegments. The alternative is an S-expression representation (dubbed "SISS", homage to SIXX) that is used for the QuickGuides contents.

hi!

related to this. Is it possible (and common) today for projects to include behaviour in the form of serilized CompiledMethods or anything similar?

gera

Yoshiki Ohshima

10:43 p.m.

Hi, Gera,

At Wed, 02 Dec 2009 16:51:38 -0300, Gerardo Richarte wrote:

...

Yoshiki Ohshima wrote:

...
For regular projects, we still use good old ImageSegments. The alternative is an S-expression representation (dubbed "SISS", homage to SIXX) that is used for the QuickGuides contents.

hi!

related to this. Is it possible (and common) today for projects to include behaviour in the form of serilized CompiledMethods or anything similar?

Currently, the user-defined methods for uniclass objects are CompiledMethods, and other code addition/changes are stored as text as a changeset. Yes, so it is common.

With a little bit of modification to the serialization logic, you can include CompiledMethod objects for regular (non-uniclass) classes in the externalized binary. One time we were experimenting with this idea to make a faster loadable extention to Etoys.

FWIW, Andreas had a mechanism for loading code quickly (now I realized that I don't have clear idea what it was), and Jecel has many (or one?) proposals for different object memory organization....

-- Yoshiki

5280

Age (days ago)

5282

Last active (days ago)

etoys-dev@lists.squeakfoundation.org

8 comments

5 participants

tags (0)

participants (5)

Andreas Raab
Gerardo Richarte
K. K. Subramaniam
Ted Kaehler
Yoshiki Ohshima