[Vm-dev] Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko siguctua at gmail.com
Thu Jun 14 06:33:08 UTC 2012


On 14 June 2012 01:08, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>
>
>
> On Wed, Jun 13, 2012 at 12:46 PM, Igor Stasenko <siguctua at gmail.com> wrote:
>>
>>
>> On 13 June 2012 20:50, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>> >
>> >
>> >
>> > On Mon, Jun 11, 2012 at 1:36 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>> >>
>> >>
>> >> Some extra ideas.
>> >>
>> >> 1. Avoiding extra header for big sized objects.
>> >> I not sure about this, but still ..
>> >>
>> >> according to Eliot's design:
>> >> 8: slot size (255 => extra header word with large size)
>> >>
>> >> What if we extend size to 16 bits (so in total it will be 65536 slots)
>> >
>> >
>> > This simply doesn't make sense within the overall context of the header (i.e. relatively large identityHash the same size as the class index).  A large size field increases the size of the header for all objects.  An extra size word for large objects only increases the size (probably by 8 bytes) for large objects.  But that's a very small percentage overhead of at most 8 / (256 * 4), or 0.8%.  Few objects are large.  The bulk of objects are smaller than 256 slots.  Its a no-brainer; have a small size field and overflow only for large objects.
>> >
>> I did not measured in terms of space, but in terms of not having
>> additional word and had to deal with it.
>> Can you please fill the gaps in your design and explain how you
>> perform heap walking.
>
>
> Hmm, that's a detail :)  Could you list the gaps you see in the design?
>

Hehe..
If you like so.. It is not clear, how many points in VM should be
aware of forwarded objects
and where you can just pass them around.
Obviously, the places where you need to access object's data will
require that check..
But i fear this could be too costly. If this check will be placed at
slot read operation , then
every slot read will mean two memory reads. (read reference value,
then read it's pointer and then check if it refers to forwarded oop).
If we cannot avoid that, then "lazy" become will mean "crawling" runtime  :)

Actually, from other side, if something reads an oop, then in most of
the cases it will need to access its contents at some point (the only
exception copy/assignment operation(s)). So, by doing this check it
will just put an object's contents in CPU cache, but i think
the slowdown will be still too significant to just ignore it.

So, i'd like to know what is your thoughts about minimizing this bad impact.
I am interested, how many places we can have, which will have strong
guarantees that no forwarded oops will ever appear there. Or we cannot
have such places and doomed to always keep checking oops at every
read?


>> A current design reserves two least significant bits to indicate
>> whether object header is 1, 2 or 3 words..
>> But from your proposed format, a least significant bits are reserved
>> for slots size field, which can be arbitrary value. so how you
>> implement heap walking and determine whether the first word of a next
>> object is its header or it it's size field.
>
>
> OK, here's one way to implement heap walking:
>
> In heap walking the memory manager needs to be able to detect the start of the next object.  This is complicated by the short and long header formats, short being for objects with 254 slots or less, long being for objects with 255 slots or more.  The class index field can be used to mark special objects.  In particular the tagged class indices 1 through 7, which correspond to objects with tag bits 1 through 7 (SmallInteger = 1, 3, 5, 7, Character = e.g. 2, and SmallFloat = e.g. 4) never occur in the class index fields of normal objects.  So if the size doubleword uses all bits other than the class field (44 bits is an adequate maximum size of 2^46 bytes, ~ 10^14 bytes) then size doubleword s can be marked by using one of the tag class indexes in its class field.  To identify the next object the VM fetches the doubleword immediately following the current object (object bodies being rounded up to 8 bytes in the 32-bit VM).  If the doubleword's class index field is the size doubleword class index pun, e.g. 1, then it is a size field and the object header is the doubleword following that, and the object's slots start after that.  if not, the object header is that doubleword and the object's slots follow that.
>

Okay. Nice trick. :)

So, can we have a slot for arbitrary properties? Then you can free
bits in header reserved for hash,
and can expand both size and class index fields.


-- 
Best regards,
Igor Stasenko.


More information about the Vm-dev mailing list