[Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

Wed Oct 22 21:59:26 UTC 2014

I wonder how the GemStone guys deal with this.

On 10/22/14 13:45 , stepharo wrote:
>
>
>
>
>
>>
>> Hi Stephane, Hi All,
>>
>>     let me talk a little about the ParcPlace experience, which led to
>> David Leibs' parcels, whose architecture Fuel uses.
>>
>> In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage
>> System), a traditional interpretive pickling system defined by a
>> little bytecoded language. Think of a bytecode as something like "What
>> follows is an object definition, which is its id followed by size info
>> followed by the definitions or ids of its sub-parts, including its
>> class", or "What follows is the id of an already defined object".  So
>> the loading interpreter looks at the next byte in the stream and that
>> tells it what to do.  So the storage is a recursive definition of a
>> graph, much like a recursive grammar for a programming language.
>>
>> This approach is slow (its a bytecode interpreter) and fragile
>> (structures in the process of being built aren't valid yet, imagine
>> trying to take the hash of a Set that is only half-way through being
>> materialized).  But this architecture was very common at the time (I
>> wrote something very similar). The advantage BOSS had was a clumsy
>> hack for versioning.  One could specify blocks that were supplied with
>> the version and state of older objects, and these blocks could effect
>> shape change etc to bring loaded instances up-to-date.
>>
>> David Leibs has an epiphany as, in the early 90's, ParcPlae was trying
>> to decompose the VW image (chainsaw was the code name of the VW 2.5
>> release).  If one groups instances by class, one can instantiate in
>> bulk, creating all the instances of a particular class in one go,
>> followed by all the instances of a different class, etc.  Then the arc
>> information (the pointers to objects to be stored in the loaded
>> objects inst vars) can follow the instance information.  So now the
>> file looks like header, names of classes that are referenced (not
>> defined), definitions of classes, definitions of instances
>> (essentially class id, count pairs), arc information.  And
>> materializing means finding the classes in the image, creating the
>> classes in the file, creating the instances, stitching the graph
>> together, and then performing any post-load actions (rehashing
>> instances, etc).
>>
>> Within months we merged with Digitalk (to form DarcPlace-Dodgytalk)
>> and were introduced to TeamV's loading model which was very much like
>> ImageSegments, being based on the VM's object format.  Because an
>> ImageSegment also has imports (references to classes and globals taken
>> from the host system, not defined in the file) performance doesn't
>> just depend on loading the segment into memorty.  It also depends on
>> how long it takes to search the system to find imports, etc.  In
>> practice we found that a) Parcels were 4 times faster than BOSS, and
>> b) they were no slower than Digitalk's image segments.  But being
>> independent of the VM's heap format Parcels had BOSS's flexibility and
>> could support shape change on load, something ImageSegments *cannot
>> do*.  I went on to extend parcels with support for shape change, plus
>> support for partial loading of code, but I won't describe that here.
>> Too detailed, even thought its very important.
>>
>> Mariano spent time talking with me and Fuel's basic architecture is
>> that of parcels, but reimplemented to be nicer, more flexible etc.
>> But essentially Parcels and Fuel are at their core David Leibs'
>> invention.  He came up with the ideas of a) grouping objects by class
>> and b) separating the arcs from the nodes.
>
> Indeed it was never our intention to say that it was our idea. I still
> remember the first time I loaded RB in VW30.... 2 s while normally loading
> code was taking the time to cook pasta. I remember that I was still
> waiting but the code was already loaded. It was a cool feeling.
> So I always wanted to experiment with that and one day mariano came and
> needed a fast loader and martin was working on ... a pickle format...
> What a coincidence :)
>> Now, where ImageSegments are faster than Parcels is *not* loading.
>> Our experience with VW vs TeamV showed us that.  But they are faster
>> in collecting the graph of objects to be included.  ImageSegments are
>> dead simple.  So IMO the right architecture is to use Parcels'
>> segregation, and Parcels' "abstract" format (independent of the heap
>> object format) with ImageSegment's computation of the object graph.
>> Igor Stasenko has suggested providing the tracing part of
>> ImageSegments (Dan Ingalls' cool invention of mark the segment root
>> objects, then mark the heap, leaving the objects to be stored unmarked
>> in the shadow of the marked segment roots) as a separate primitive.
>> Then this can be quickly partitioned by class and then written by
>> Smalltalk code.
> may be. For me if the use of IS is tructured (ie you control the fact
> that there will no pointer to the graph from elements that are not in
> the roots)
> then you may have a stable system on reload else you will have to decide
> what to do on reload and this can be a real pain.
>
>> The loader can then materialize objects using Smalltalk code, can deal
>> with shape change, and not be significantly slower than image
>> segments.  Crucially this means that one has a portable, long-lived
>> object storage format; freeing the VM to evolve its object format
>> without breaking image segments with every change to the object format.
> Oh yes! This was what was also worrying to me.
>
>> I'd be happy to help people working on Fuel by providing that
>> primitive for anyone who wants to try and reimplement the ImageSegment
>> functonality (project saving, class faulting, etc) above Fuel.
>
> We do not have the resources for that now and will get probably less in
> the future because student cost doubled for internships :(
>
> Stef
>>
>>
>> On Wed, Oct 22, 2014 at 11:56 AM, Stéphane Ducasse
>> <stephane.ducasse at inria.fr <mailto:stephane.ducasse at inria.fr>> wrote:
>>
>>     What I can tell you is that instability raised by just having one
>>     single pointer not in the root objects
>>     pointing to an element in the segment and the implication of this
>>     pointer on reloaded segments, (yes I do not want to have two
>>     objects in memory after loading) makes sure that we will not use
>>     IS primitive in Pharo in any future. For us this is a non feature.
>>
>>     IS was a nice trick but since having a pointer to an object is so
>>     cheap and the basis of our computational model
>>     so this is lead fo unpredictable side effects. We saw that when
>>     mariano worked during the first year of his PhD (which is a kind
>>     of LOOM revisit).
>>
>>     Stef
>>
>>
>>
>>
>> --
>> best,
>> Eliot
>