[Vm-dev] [squeak-dev] re: Image Segment semantics and weakness

Mariano Martinez Peck marianopeck at gmail.com
Sun Oct 26 15:41:07 UTC 2014

On Sun, Oct 26, 2014 at 12:35 PM, Bert Freudenberg <bert at freudenbergs.de>

> That is not a limitation of ImageSegments per se, but just how they are
> used in Etoys.
I agree at some point. But what if the serializer were able to serialize
classes as well? Fuel is able to serialize classes, traits, closures,
compiled methods, etc. Of course there are scenarios when this becomes
complicated, but for the average it works.

> - Bert -
> On 26.10.2014, at 05:30, karl ramberg <karlramberg at gmail.com> wrote:
> One aspect with Etoys projects is that they can not extend the system. It
> works nicely if you just use Etoys tile scripting. But if you introduce a
> new class in a project, loading the project in a system that do not have
> that class will fail. So the use of project as a distribution system of
> applications will be limited to a certain version of images.
> Karl
> On Wed, Oct 22, 2014 at 9:53 PM, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
>> Hi Stephane, Hi All,
>>     let me talk a little about the ParcPlace experience, which led to
>> David Leibs' parcels, whose architecture Fuel uses.
>> In the late 80's 90's Peter Deutsch write BOSS (Binary Object Storage
>> System), a traditional interpretive pickling system defined by a little
>> bytecoded language. Think of a bytecode as something like "What follows is
>> an object definition, which is its id followed by size info followed by the
>> definitions or ids of its sub-parts, including its class", or "What follows
>> is the id of an already defined object".  So the loading interpreter looks
>> at the next byte in the stream and that tells it what to do.  So the
>> storage is a recursive definition of a graph, much like a recursive grammar
>> for a programming language.
>> This approach is slow (its a bytecode interpreter) and fragile
>> (structures in the process of being built aren't valid yet, imagine trying
>> to take the hash of a Set that is only half-way through being
>> materialized).  But this architecture was very common at the time (I wrote
>> something very similar).  The advantage BOSS had was a clumsy hack for
>> versioning.  One could specify blocks that were supplied with the version
>> and state of older objects, and these blocks could effect shape change etc
>> to bring loaded instances up-to-date.
>> David Leibs has an epiphany as, in the early 90's, ParcPlae was trying to
>> decompose the VW image (chainsaw was the code name of the VW 2.5 release).
>> If one groups instances by class, one can instantiate in bulk, creating all
>> the instances of a particular class in one go, followed by all the
>> instances of a different class, etc.  Then the arc information (the
>> pointers to objects to be stored in the loaded objects inst vars) can
>> follow the instance information.  So now the file looks like header, names
>> of classes that are referenced (not defined), definitions of classes,
>> definitions of instances (essentially class id, count pairs), arc
>> information.  And materializing means finding the classes in the image,
>> creating the classes in the file, creating the instances, stitching the
>> graph together, and then performing any post-load actions (rehashing
>> instances, etc).
>> Within months we merged with Digitalk (to form DarcPlace-Dodgytalk) and
>> were introduced to TeamV's loading model which was very much like
>> ImageSegments, being based on the VM's object format.  Because an
>> ImageSegment also has imports (references to classes and globals taken from
>> the host system, not defined in the file) performance doesn't just depend
>> on loading the segment into memorty.  It also depends on how long it takes
>> to search the system to find imports, etc.  In practice we found that a)
>> Parcels were 4 times faster than BOSS, and b) they were no slower than
>> Digitalk's image segments.  But being independent of the VM's heap format
>> Parcels had BOSS's flexibility and could support shape change on load,
>> something ImageSegments *cannot do*.  I went on to extend parcels with
>> support for shape change, plus support for partial loading of code, but I
>> won't describe that here.  Too detailed, even thought its very important.
>> Mariano spent time talking with me and Fuel's basic architecture is that
>> of parcels, but reimplemented to be nicer, more flexible etc.  But
>> essentially Parcels and Fuel are at their core David Leibs' invention.  He
>> came up with the ideas of a) grouping objects by class and b) separating
>> the arcs from the nodes.
>> Now, where ImageSegments are faster than Parcels is *not* loading.  Our
>> experience with VW vs TeamV showed us that.  But they are faster in
>> collecting the graph of objects to be included.  ImageSegments are dead
>> simple.  So IMO the right architecture is to use Parcels' segregation, and
>> Parcels' "abstract" format (independent of the heap object format) with
>> ImageSegment's computation of the object graph.  Igor Stasenko has
>> suggested providing the tracing part of ImageSegments (Dan Ingalls' cool
>> invention of mark the segment root objects, then mark the heap, leaving the
>> objects to be stored unmarked in the shadow of the marked segment roots) as
>> a separate primitive.  Then this can be quickly partitioned by class and
>> then written by Smalltalk code.
>> The loader can then materialize objects using Smalltalk code, can deal
>> with shape change, and not be significantly slower than image segments.
>> Crucially this means that one has a portable, long-lived object storage
>> format; freeing the VM to evolve its object format without breaking image
>> segments with every change to the object format.
>> I'd be happy to help people working on Fuel by providing that primitive
>> for anyone who wants to try and reimplement the ImageSegment functonality
>> (project saving, class faulting, etc) above Fuel.
>> On Wed, Oct 22, 2014 at 11:56 AM, Stéphane Ducasse <
>> stephane.ducasse at inria.fr> wrote:
>>> What I can tell you is that instability raised by just having one single
>>> pointer not in the root objects
>>> pointing to an element in the segment and the implication of this
>>> pointer on reloaded segments, (yes I do not want to have two objects in
>>> memory after loading) makes sure that we will not use IS primitive in Pharo
>>> in any future. For us this is a non feature.
>>> IS was a nice trick but since having a pointer to an object is so cheap
>>> and the basis of our computational model
>>> so this is lead fo unpredictable side effects. We saw that when mariano
>>> worked during the first year of his PhD (which is a kind of LOOM revisit).
>>> Stef
>> --
>> best,
>> Eliot

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20141026/cd2da33c/attachment-0001.htm

More information about the Vm-dev mailing list