[Vm-dev] [squeak-dev] re: Image Segment semantics and weakness
David T. Lewis
lewis at mail.msen.com
Thu Oct 23 00:52:17 UTC 2014
Thanks for this background, it is very helpful and interesting.
I would also like to put in a good word for Fuel. It is well designed, well
documented, and well supported on Squeak and Pharo. Very high quality work.
I use Fuel in RemoteTask (in package CommandShell) for inter-image communication.
ReferenceStream also works, and both are supported in RemoteTask. But if you
want to have a serializer that you can read and understand, I'd say that Fuel
is hard to beat.
I am not advocating anything with respect to image segments, project saving,
and so forth, I'm just saying that Fuel is a very good thing. It works well
in Squeak, and I suspect that many folks may not be aware of this.
On Wed, Oct 22, 2014 at 12:53:15PM -0700, Eliot Miranda wrote:
> Hi Stephane, Hi All,
> let me talk a little about the ParcPlace experience, which led to David
> Leibs' parcels, whose architecture Fuel uses.
> In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage
> System), a traditional interpretive pickling system defined by a little
> bytecoded language. Think of a bytecode as something like "What follows is
> an object definition, which is its id followed by size info followed by the
> definitions or ids of its sub-parts, including its class", or "What follows
> is the id of an already defined object". So the loading interpreter looks
> at the next byte in the stream and that tells it what to do. So the
> storage is a recursive definition of a graph, much like a recursive grammar
> for a programming language.
> This approach is slow (its a bytecode interpreter) and fragile (structures
> in the process of being built aren't valid yet, imagine trying to take the
> hash of a Set that is only half-way through being materialized). But this
> architecture was very common at the time (I wrote something very similar).
> The advantage BOSS had was a clumsy hack for versioning. One could specify
> blocks that were supplied with the version and state of older objects, and
> these blocks could effect shape change etc to bring loaded instances
> David Leibs has an epiphany as, in the early 90's, ParcPlae was trying to
> decompose the VW image (chainsaw was the code name of the VW 2.5 release).
> If one groups instances by class, one can instantiate in bulk, creating all
> the instances of a particular class in one go, followed by all the
> instances of a different class, etc. Then the arc information (the
> pointers to objects to be stored in the loaded objects inst vars) can
> follow the instance information. So now the file looks like header, names
> of classes that are referenced (not defined), definitions of classes,
> definitions of instances (essentially class id, count pairs), arc
> information. And materializing means finding the classes in the image,
> creating the classes in the file, creating the instances, stitching the
> graph together, and then performing any post-load actions (rehashing
> instances, etc).
> Within months we merged with Digitalk (to form DarcPlace-Dodgytalk) and
> were introduced to TeamV's loading model which was very much like
> ImageSegments, being based on the VM's object format. Because an
> ImageSegment also has imports (references to classes and globals taken from
> the host system, not defined in the file) performance doesn't just depend
> on loading the segment into memorty. It also depends on how long it takes
> to search the system to find imports, etc. In practice we found that a)
> Parcels were 4 times faster than BOSS, and b) they were no slower than
> Digitalk's image segments. But being independent of the VM's heap format
> Parcels had BOSS's flexibility and could support shape change on load,
> something ImageSegments *cannot do*. I went on to extend parcels with
> support for shape change, plus support for partial loading of code, but I
> won't describe that here. Too detailed, even thought its very important.
> Mariano spent time talking with me and Fuel's basic architecture is that of
> parcels, but reimplemented to be nicer, more flexible etc. But essentially
> Parcels and Fuel are at their core David Leibs' invention. He came up with
> the ideas of a) grouping objects by class and b) separating the arcs from
> the nodes.
> Now, where ImageSegments are faster than Parcels is *not* loading. Our
> experience with VW vs TeamV showed us that. But they are faster in
> collecting the graph of objects to be included. ImageSegments are dead
> simple. So IMO the right architecture is to use Parcels' segregation, and
> Parcels' "abstract" format (independent of the heap object format) with
> ImageSegment's computation of the object graph. Igor Stasenko has
> suggested providing the tracing part of ImageSegments (Dan Ingalls' cool
> invention of mark the segment root objects, then mark the heap, leaving the
> objects to be stored unmarked in the shadow of the marked segment roots) as
> a separate primitive. Then this can be quickly partitioned by class and
> then written by Smalltalk code.
> The loader can then materialize objects using Smalltalk code, can deal with
> shape change, and not be significantly slower than image segments.
> Crucially this means that one has a portable, long-lived object storage
> format; freeing the VM to evolve its object format without breaking image
> segments with every change to the object format.
> I'd be happy to help people working on Fuel by providing that primitive for
> anyone who wants to try and reimplement the ImageSegment functonality
> (project saving, class faulting, etc) above Fuel.
> On Wed, Oct 22, 2014 at 11:56 AM, St??phane Ducasse <
> stephane.ducasse at inria.fr> wrote:
> > What I can tell you is that instability raised by just having one single
> > pointer not in the root objects
> > pointing to an element in the segment and the implication of this pointer
> > on reloaded segments, (yes I do not want to have two objects in memory
> > after loading) makes sure that we will not use IS primitive in Pharo in any
> > future. For us this is a non feature.
> > IS was a nice trick but since having a pointer to an object is so cheap
> > and the basis of our computational model
> > so this is lead fo unpredictable side effects. We saw that when mariano
> > worked during the first year of his PhD (which is a kind of LOOM revisit).
> > Stef
More information about the Vm-dev