[Vm-dev] Re: [Pharo-project] Plan/discussion/communication around new object format

Wed Jun 13 23:08:38 UTC 2012

On Wed, Jun 13, 2012 at 12:46 PM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> On 13 June 2012 20:50, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> >
> >
> >
> > On Mon, Jun 11, 2012 at 1:36 AM, Igor Stasenko <siguctua at gmail.com>
> wrote:
> >>
> >>
> >> Some extra ideas.
> >>
> >> 1. Avoiding extra header for big sized objects.
> >> I not sure about this, but still ..
> >>
> >> according to Eliot's design:
> >> 8: slot size (255 => extra header word with large size)
> >>
> >> What if we extend size to 16 bits (so in total it will be 65536 slots)
> >
> >
> > This simply doesn't make sense within the overall context of the header
> (i.e. relatively large identityHash the same size as the class index).  A
> large size field increases the size of the header for all objects.  An
> extra size word for large objects only increases the size (probably by 8
> bytes) for large objects.  But that's a very small percentage overhead of
> at most 8 / (256 * 4), or 0.8%.  Few objects are large.  The bulk of
> objects are smaller than 256 slots.  Its a no-brainer; have a small size
> field and overflow only for large objects.
> >
> I did not measured in terms of space, but in terms of not having
> additional word and had to deal with it.
> Can you please fill the gaps in your design and explain how you
> perform heap walking.
>

Hmm, that's a detail :)  Could you list the gaps you see in the design?

A current design reserves two least significant bits to indicate
> whether object header is 1, 2 or 3 words..
> But from your proposed format, a least significant bits are reserved
> for slots size field, which can be arbitrary value. so how you
> implement heap walking and determine whether the first word of a next
> object is its header or it it's size field.
>

OK, here's one way to implement heap walking:

In heap walking the memory manager needs to be able to detect the start of
the next object.  This is complicated by the short and long header formats,
short being for objects with 254 slots or less, long being for objects with
255 slots or more.  The class index field can be used to mark special
objects.  In particular the tagged class indices 1 through 7, which
correspond to objects with tag bits 1 through 7 (SmallInteger = 1, 3, 5, 7,
Character = e.g. 2, and SmallFloat = e.g. 4) never occur in the class index
fields of normal objects.  So if the size doubleword uses all bits other
than the class field (44 bits is an adequate maximum size of 2^46 bytes, ~
10^14 bytes) then size doubleword s can be marked by using one of the tag
class indexes in its class field.  To identify the next object the VM
fetches the doubleword immediately following the current object (object
bodies being rounded up to 8 bytes in the 32-bit VM).  If the
doubleword's class
index field is the size doubleword class index pun, e.g. 1, then it is a
size field and the object header is the doubleword following that, and the
object's slots start after that.  if not, the object header is that doubleword
and the object's slots follow that.

>
> >>
> >> and we have a single flag, pointing how to calculate object size:
> >>
> >> flag(0)   object size = (size field) * 8
> >> flag(1)  object size = 2^ (slot field)
> >>
> >> which means that past 2^16 (or how many bits we dedicate to size field
> >> in header) all object sizes
> >> will be power of two.
> >> Since most of the objects will fit under 2^16, we don't lose much.
> >> For big arrays, we could have a special collection/array, which will
> >> store exact size in it's inst var (and we even don't need to care in
> >> cases of Sets/Dicts/OrderedCollections).
> >> Also we can actually make it transparent:
> >>
> >> Array class>>new: size
> >>  size > (max exact size ) ifTrue: [ ^ ArrayWithBigSizeWhatever new:
> size ]
> >>
> >> of course, care must be taken for those variable classes which
> >> potentially can hold large amounts of bytes (like Bitmap).
> >> But i think code can be quickly adopted to this feature of VM, which
> >> will simply fail a #new: primitive
> >> if size is not power of two for sizes greater than max "exact size"
> >> which can fit into size field of header.
> >> ----
> >>
> >> 2. Slot for arbitrary properties.
> >> If you read carefully, Eliot said that for making lazy become it is
> >> necessary to always have some extra space per object, even if object
> >> don't have any fields:
> >>
> >> <<We shall probably keep the minimum object size at 16 bytes so that
> >> there is always room for a forwarding pointer. >>
> >>
> >> So, this fits quite well with idea of having slot for dynamic
> >> properties per object. What if instead of "extending object" when it
> >> requires extra properties slot, we just reserve the slot for
> >> properties at the very beginning:
> >>
> >> [ header ]
> >> [ properties slot]
> >> ... rest of data ..
> >>
> >> so, any object will have that slot. And in case of lazy-become. we can
> >> use that slot for holding forwarding pointer. Voila.
> >>
> >> 3. From 2. we going straight back to hash.. VM don't needs to know
> >> such a thing as object's hash, it has no semantic load inside VM, it
> >> just answers those bits by a single primitive.
> >>
> >> So, why it is kind of enforced inherent property of all objects in
> >> system? And why nobody asks, if we have that one, why we could not
> >> have more than one or as many as we want? This is my central question
> >> around idea of having per-object properties.
> >> Once VM will guarantee that any object can have at least one slot for
> >> storing object reference (property slot),
> >> then it is no longer needed for VM to care about identity hash.
> >>
> >> Because it can be implemented completely at language size. But most of
> >> all, we are NO longer limited
> >> how big/small hash values , which directly converts into bonuses: less
> >> hash collisions > more performance. Want 64-bit hash? 128-bit?
> >> Whatever you desire:
> >>
> >> Object>>identityHash
> >>   ^ self propertiesAt: #hash ifAbsentPut: [ HashGenerator newHashValue ]
> >>
> >> and once we could have per-object properties.. and lazy become, things
> >> like Magma will get a HUGE benefits straightly out of the box.
> >> Because look, lazy become, immutability - those two addressing many
> >> problems related to OODB implementation
> >> (i barely see other use cases, where immutability would be as useful
> >> as in cases of OODB)..
> >> so for me it is logical to have this last step: by adding arbitrary
> >> properties, OODB now can store the ID there.
> >>
> >> --
> >> Best regards,
> >> Igor Stasenko.
> >
> >
> >
> >
> > --
> > best,
> > Eliot
> >
> >
>
>
>
> --
> Best regards,
> Igor Stasenko.
>

-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20120613/31d5bb8a/attachment.htm