Hi Jim,
What I understand from looking at the code in DataStream, ReferenceStream, and SmartRefStream is that they will usually write the object graph depth-first and also read and materialize it depth-first. For example, when it reads an object with three instance variables, it will create this object with #basicNew, then read the object of the first instance variable using #next, which may recursively read many more objects with #next depending on the depth of the tree, then eventually come back to read the object of the second instance variable, and so on. Note that the topology of the object graph is not necessarily the same as the topology of the Morph hierarchy tree. Notably they are not the same with your addition of "siblings". When ReferenceStream and SmartRefStream encounter an object that was already written, they will only write a reference. Normally such references would only point "back" but not "forward" in the stream. However the class comment of ReferenceStream says that it supports "weak references" where the object will only be included in the stream if it is put "non-weakly" into the stream from somewhere else. For example, the owner of a morph is written weakly. In this weak case, a reference might also point forward in the stream if the referenced object was not already encountered during the traversal so far. So in the absence of weak references, I think the .morph file would be read straight from start to end, and with weak references the stream position might jump around a few times. When the stream reaches the position of an object that was already read in advance due to forward references, it gets skipped. So overall no part of the file is read twice.
(Note that these weak references are different from the weak references that the garbage collector knows. Whether a reference is weak for the data streams depends on whether the object gets written with nextPut: or nextPutWeak:. For VM/GC weak references, it depends on the type of the class.)
So, assuming you have a morph tree like this: A--B--C --D--E--F --G--H where B, D, G are submorphs of A, and C, E, H are siblings by your own implementation with a siblings instance variable in a subclass of Morph, I would assume that the morphs are serialized in the file in the order A, B, C, E, F, H, D, G and the materialization order would be A, B, C, E, D, F, H, G. The difference in the orders comes from the "weak" owner references.
The order of instance variables is owner, submorphs, siblings (ignoring all the others). This determines the order of processing, with the specialty that the owner is weakly put into the stream.
The first object to be put into the stream is A. The owner of A will be put weakly and be deferred (but we already know that this will not be put from anywhere else, so this will stay nil in the stream). When writing B, C, their owners were already encountered by the traversal, so the owners are just written as backward references into the stream. E is first encountered as a sibling of C, before the traversal of the submorphs of A even returned from B. So D as the owner of E will not have been encountered yet, and because the owner is put weakly, the writing of D will be delayed. But D will be put into the stream later via the submorphs of A, at which point the weak reference from E will be filled. In the end, E's owner reference will point forward to D in the stream.
While reading from the stream it will immediately materialize every encountered object as an empty shell and one after another fill in its variables: 1. Descend into the submorphs of A depth-first. (Actually it will meet the Collection in submorphs first, but proceed to its elements immediately.) (Owner references to A and B are back references and are filled in before proceeding to the submorphs, without recursively going into A and B again.) 2. Go into the siblings of C, meeting E. 3. Proceed to D as the owner of E. Since A is already there, it will be filled in as owner of D. E is also already in the making and will be filled in as the sole submorph of D. 4. Descend into the submorphs of E, meeting F. 4b. If E also has C and H as siblings, fill in a back reference to the existing C. 5. Continue siblings of C or E, meeting H in either case. 6. Proceed to G as owner of H. 7. Since all of the other morphs don't have further submorphs or siblings that were not already encountered, come back to the submorphs of A and continue there. But D and G were already materialized, too. If there were another submorph after G, it would be materialized now.
The morphs each would receive #comeFullyUpOnReload: after returning from reading all their instance variables. That means, the order of these sends would be a) if E and H do not have any siblings even though C has them: D, F, E, G, H, C, B, A. b) if C, E, H are fully-mutual siblings: D, F, G, H, E, C, B, A.
Hope this helps, and that I got it right.
Kind regards, Jakob
Am Di., 25. Juli 2023 um 14:12 Uhr schrieb Jim Rosenberg jr@amanue.com:
Thank you very VERY much: this helps a lot.
One point on which I am still in a bit of a muddle. I have some morph classes of my own where a morph may be related to a "nearby" morph in what amounts to a sibling relationship where there is no form of parent-child relationship. An instance variable might track this giving all its siblings. (It's a bit more complicated than this, but this gives the basic idea.) Among a set of siblings, when restored from file, one of them has to be restored "first". How does SmartRefStream "look ahead' to objects "later" in the stream? Does it make multiple passes through the whole .morph file?
-Thanks, Jim
On Mon, 24 Jul 2023 23:28:13 +0200 Jakob Reschke jakres+squeak@gmail.com wrote:
Hi Jim,
Am Do., 13. Juli 2023 um 17:28 Uhr schrieb Jim Rosenberg <jr@amanue.com :
For instance: One of the things that has to happen when a morph is loaded from file is that various instance variables have to have their value restored. I might or might not have code in a new or initialize method which gives those variables a value. I guess I've been assuming that the load morph process will "supersede" my morph creation code in giving loaded morph instance variables the right value. Now I'm thinking that assumption may not be valid, and if I have (say) an initialize instance method, that might get executed *after* load morph instance value restoration happens. Comments please?
All instance variables are saved to the file, and they are all restored from the file. Initialize is not run after loading from the file. In fact, #initialize is not sent at all to instances restored from a file. The "new" objects are created by sending #basicNew to the class, rather than #new.
Related question: Is there an intent in the save/load morph to/from file process to preserve submorph order? Or do I have to have my own code in my subclasses to "guarantee" this? (I've been just assuming submorph order will be preserved.)
The submorphs collection and its original order should be restored from the file, just like any other instance variable.
Is there a way my code can "know" whether a morph is being created from the load morph from file process, rather than interactively?
When objects are loaded from a DataStream, which the files of the save&load feature are, they will receive the message #comeFullyUpOnReload: after all their variables have been restored. (Conversely, objects receive #objectForDataStream: before they are saved. This allows objects to store themselves as "something else", such as a DiskProxy that is just a reference to a singleton object in the system.)
Moreover, when classes get new inst vars added or some removed, it is possible to convert instances loaded from files/SmartRefStreams by implementing a migration method with the selector #convertToCurrentVersion:refStream:.
Kind regards, Jakob