[Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

Thu Oct 23 03:17:07 UTC 2014

Hi David,

On Oct 22, 2014, at 5:52 PM, "David T. Lewis" <lewis at mail.msen.com> wrote:

> 
> Eliot,
> 
> Thanks for this background, it is very helpful and interesting.
> 
> I would also like to put in a good word for Fuel. It is well designed, well
> documented, and well supported on Squeak and Pharo. Very high quality work.
> 
> I use Fuel in RemoteTask (in package CommandShell) for inter-image communication.
> ReferenceStream also works, and both are supported in RemoteTask. But if you
> want to have a serializer that you can read and understand, I'd say that Fuel
> is hard to beat.
> 
> I am not advocating anything with respect to image segments, project saving,
> and so forth, I'm just saying that Fuel is a very good thing. It works well
> in Squeak, and I suspect that many folks may not be aware of this.

Oh I agree.  If only ImageSegments weren't used... :-).  We use an early version of Fuel at Cadence which is essential to our system.  We haven't upgraded as it "just works".

> 
> Dave
> 
> On Wed, Oct 22, 2014 at 12:53:15PM -0700, Eliot Miranda wrote:
>> 
>> Hi Stephane, Hi All,
>> 
>>    let me talk a little about the ParcPlace experience, which led to David
>> Leibs' parcels, whose architecture Fuel uses.
>> 
>> In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage
>> System), a traditional interpretive pickling system defined by a little
>> bytecoded language. Think of a bytecode as something like "What follows is
>> an object definition, which is its id followed by size info followed by the
>> definitions or ids of its sub-parts, including its class", or "What follows
>> is the id of an already defined object".  So the loading interpreter looks
>> at the next byte in the stream and that tells it what to do.  So the
>> storage is a recursive definition of a graph, much like a recursive grammar
>> for a programming language.
>> 
>> This approach is slow (its a bytecode interpreter) and fragile (structures
>> in the process of being built aren't valid yet, imagine trying to take the
>> hash of a Set that is only half-way through being materialized).  But this
>> architecture was very common at the time (I wrote something very similar).
>> The advantage BOSS had was a clumsy hack for versioning.  One could specify
>> blocks that were supplied with the version and state of older objects, and
>> these blocks could effect shape change etc to bring loaded instances
>> up-to-date.
>> 
>> David Leibs has an epiphany as, in the early 90's, ParcPlae was trying to
>> decompose the VW image (chainsaw was the code name of the VW 2.5 release).
>> If one groups instances by class, one can instantiate in bulk, creating all
>> the instances of a particular class in one go, followed by all the
>> instances of a different class, etc.  Then the arc information (the
>> pointers to objects to be stored in the loaded objects inst vars) can
>> follow the instance information.  So now the file looks like header, names
>> of classes that are referenced (not defined), definitions of classes,
>> definitions of instances (essentially class id, count pairs), arc
>> information.  And materializing means finding the classes in the image,
>> creating the classes in the file, creating the instances, stitching the
>> graph together, and then performing any post-load actions (rehashing
>> instances, etc).
>> 
>> Within months we merged with Digitalk (to form DarcPlace-Dodgytalk) and
>> were introduced to TeamV's loading model which was very much like
>> ImageSegments, being based on the VM's object format.  Because an
>> ImageSegment also has imports (references to classes and globals taken from
>> the host system, not defined in the file) performance doesn't just depend
>> on loading the segment into memorty.  It also depends on how long it takes
>> to search the system to find imports, etc.  In practice we found that a)
>> Parcels were 4 times faster than BOSS, and b) they were no slower than
>> Digitalk's image segments.  But being independent of the VM's heap format
>> Parcels had BOSS's flexibility and could support shape change on load,
>> something ImageSegments *cannot do*.  I went on to extend parcels with
>> support for shape change, plus support for partial loading of code, but I
>> won't describe that here.  Too detailed, even thought its very important.
>> 
>> Mariano spent time talking with me and Fuel's basic architecture is that of
>> parcels, but reimplemented to be nicer, more flexible etc.  But essentially
>> Parcels and Fuel are at their core David Leibs' invention.  He came up with
>> the ideas of a) grouping objects by class and b) separating the arcs from
>> the nodes.
>> 
>> 
>> Now, where ImageSegments are faster than Parcels is *not* loading.  Our
>> experience with VW vs TeamV showed us that.  But they are faster in
>> collecting the graph of objects to be included.  ImageSegments are dead
>> simple.  So IMO the right architecture is to use Parcels' segregation, and
>> Parcels' "abstract" format (independent of the heap object format) with
>> ImageSegment's computation of the object graph.  Igor Stasenko has
>> suggested providing the tracing part of ImageSegments (Dan Ingalls' cool
>> invention of mark the segment root objects, then mark the heap, leaving the
>> objects to be stored unmarked in the shadow of the marked segment roots) as
>> a separate primitive.  Then this can be quickly partitioned by class and
>> then written by Smalltalk code.
>> 
>> The loader can then materialize objects using Smalltalk code, can deal with
>> shape change, and not be significantly slower than image segments.
>> Crucially this means that one has a portable, long-lived object storage
>> format; freeing the VM to evolve its object format without breaking image
>> segments with every change to the object format.
>> 
>> I'd be happy to help people working on Fuel by providing that primitive for
>> anyone who wants to try and reimplement the ImageSegment functonality
>> (project saving, class faulting, etc) above Fuel.
>> 
>> 
>> On Wed, Oct 22, 2014 at 11:56 AM, St??phane Ducasse <
>> stephane.ducasse at inria.fr> wrote:
>> 
>>> What I can tell you is that instability raised by just having one single
>>> pointer not in the root objects
>>> pointing to an element in the segment and the implication of this pointer
>>> on reloaded segments, (yes I do not want to have two objects in memory
>>> after loading) makes sure that we will not use IS primitive in Pharo in any
>>> future. For us this is a non feature.
>>> 
>>> IS was a nice trick but since having a pointer to an object is so cheap
>>> and the basis of our computational model
>>> so this is lead fo unpredictable side effects. We saw that when mariano
>>> worked during the first year of his PhD (which is a kind of LOOM revisit).
>>> 
>>> Stef
>> 
>> 
>> 
>> -- 
>> best,
>> Eliot
>