[Vm-dev] Re: [Pharo-project] Plan/discussion/communication around
new object format
siguctua at gmail.com
Wed Jun 13 19:46:34 UTC 2012
On 13 June 2012 20:50, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> On Mon, Jun 11, 2012 at 1:36 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>> Some extra ideas.
>> 1. Avoiding extra header for big sized objects.
>> I not sure about this, but still ..
>> according to Eliot's design:
>> 8: slot size (255 => extra header word with large size)
>> What if we extend size to 16 bits (so in total it will be 65536 slots)
> This simply doesn't make sense within the overall context of the header (i.e. relatively large identityHash the same size as the class index). A large size field increases the size of the header for all objects. An extra size word for large objects only increases the size (probably by 8 bytes) for large objects. But that's a very small percentage overhead of at most 8 / (256 * 4), or 0.8%. Few objects are large. The bulk of objects are smaller than 256 slots. Its a no-brainer; have a small size field and overflow only for large objects.
I did not measured in terms of space, but in terms of not having
additional word and had to deal with it.
Can you please fill the gaps in your design and explain how you
perform heap walking.
A current design reserves two least significant bits to indicate
whether object header is 1, 2 or 3 words..
But from your proposed format, a least significant bits are reserved
for slots size field, which can be arbitrary value. so how you
implement heap walking and determine whether the first word of a next
object is its header or it it's size field.
>> and we have a single flag, pointing how to calculate object size:
>> flag(0) object size = (size field) * 8
>> flag(1) object size = 2^ (slot field)
>> which means that past 2^16 (or how many bits we dedicate to size field
>> in header) all object sizes
>> will be power of two.
>> Since most of the objects will fit under 2^16, we don't lose much.
>> For big arrays, we could have a special collection/array, which will
>> store exact size in it's inst var (and we even don't need to care in
>> cases of Sets/Dicts/OrderedCollections).
>> Also we can actually make it transparent:
>> Array class>>new: size
>> size > (max exact size ) ifTrue: [ ^ ArrayWithBigSizeWhatever new: size ]
>> of course, care must be taken for those variable classes which
>> potentially can hold large amounts of bytes (like Bitmap).
>> But i think code can be quickly adopted to this feature of VM, which
>> will simply fail a #new: primitive
>> if size is not power of two for sizes greater than max "exact size"
>> which can fit into size field of header.
>> 2. Slot for arbitrary properties.
>> If you read carefully, Eliot said that for making lazy become it is
>> necessary to always have some extra space per object, even if object
>> don't have any fields:
>> <<We shall probably keep the minimum object size at 16 bytes so that
>> there is always room for a forwarding pointer. >>
>> So, this fits quite well with idea of having slot for dynamic
>> properties per object. What if instead of "extending object" when it
>> requires extra properties slot, we just reserve the slot for
>> properties at the very beginning:
>> [ header ]
>> [ properties slot]
>> ... rest of data ..
>> so, any object will have that slot. And in case of lazy-become. we can
>> use that slot for holding forwarding pointer. Voila.
>> 3. From 2. we going straight back to hash.. VM don't needs to know
>> such a thing as object's hash, it has no semantic load inside VM, it
>> just answers those bits by a single primitive.
>> So, why it is kind of enforced inherent property of all objects in
>> system? And why nobody asks, if we have that one, why we could not
>> have more than one or as many as we want? This is my central question
>> around idea of having per-object properties.
>> Once VM will guarantee that any object can have at least one slot for
>> storing object reference (property slot),
>> then it is no longer needed for VM to care about identity hash.
>> Because it can be implemented completely at language size. But most of
>> all, we are NO longer limited
>> how big/small hash values , which directly converts into bonuses: less
>> hash collisions > more performance. Want 64-bit hash? 128-bit?
>> Whatever you desire:
>> ^ self propertiesAt: #hash ifAbsentPut: [ HashGenerator newHashValue ]
>> and once we could have per-object properties.. and lazy become, things
>> like Magma will get a HUGE benefits straightly out of the box.
>> Because look, lazy become, immutability - those two addressing many
>> problems related to OODB implementation
>> (i barely see other use cases, where immutability would be as useful
>> as in cases of OODB)..
>> so for me it is logical to have this last step: by adding arbitrary
>> properties, OODB now can store the ID there.
>> Best regards,
>> Igor Stasenko.
More information about the Vm-dev