[Vm-dev] Re: [Pharo-project] Plan/discussion/communication around new object format

Igor Stasenko siguctua at gmail.com
Wed May 30 21:43:05 UTC 2012


Here are couple (2) of mine, highly valuable cents :)

2^20 for classes?
might be fine (or even overkill) for smalltalk systems, but could be
quite limiting for one who would like experiment and implementing a
prototype-based frameworks,
where every object is a "class" by itself.

---
8: slot size (255 => extra header word with large size)
3: odd bytes/fixed fields (odd bytes for non-pointer, fixed fields for
pointer, 7 => # fixed fields is in the class)
4 bits: format (pointers, indexable, bytes/shorts/longs/doubles
indexable, compiled method, ephemerons, weak, etc)
1: immutability
3: GC 2 mark bits. 1 forwarded bit
20: identity hash
20: class index
---
what takes most of the space in object header? right - hash!
Now, since we will have lazy become i am back to my idea of having
extra & arbitrary properties
per object.

In a nutshell, the idea is to not store hash in an object header, but
instead use just a single bit 'hash present'.

When identity hash of object is requested (via corresponding primitive)
the implementation could check if 'hash present' is set,
then if it's not there , we do a 'lazy become' of existing object to same object
copied into another place, but with hash bit set, and with extra 64-bit field,
where hash value can be stored.

So, when you requesting an identity hash for object which don't have it,
the object of from:
[header][...data.. ]
copied to new memory region with new layout:
[header][hash bits][...data..]

and old object, is of course 'corpsed' to forwarding pointer to new location.

Next step is going from holding just hash to having an arbitrary &
dynamic number of extra fields per object.
In same way, we use 1 extra bit, indicating that object having extra properties.
And when object don't have it, we lazy-become it from being
[header][...data.. ]
or
[header][hash bits][...data..]
to:
[header][hash bits][oop][...data..]

where 'oop' can be anything - instance of Array/Dictionary (depends
how language-side will decide to store extra properties of object)

This , for instance , would allow us to store extra properties for
special object formats like variable bytes or compiled methods, which
don't have the instance variables.

Not need to mention, how useful being able to attach extra properties
per object, without changing the object's class.
And , of course the freed 18 bits (20 - 2) in header can be allocated
for other purposes.
(Stef, how many bits you need for experiments? ;)

------------

About immediates zoo.

Keep in mind, that the more immediates we have, the more complex implementation
tends to be.

I would just keep 2 data types:
 - integers
 - floats

and third, special 'arbitrary' immediate , which seen by VM as a 60-bit value.
The interpretation of this value depends on lookup in range-table,
where developer specifying the correspondence between the value
interval and class:
[min .. max] -> class

intervals, of course, cannot overlap.
Determining a class of such immediate might be slower - O(log2(n)) at
best (where n is size of range table), but from other side,
how many different kinds of immediates you can fit into 60-bit value?
Right, it is 2^60. Much more than proposed 8 isn't? :)

And this extra cost can be mitigated completely by inline cache.
- in case of regular reference, you must fetch the object's class and
then compare it with one, stored in cache.
- in case of immediate reference, you compare immediate value with min
and max stored in cache fields.
And if value is in range, you got a cache hit, and free to proceed.
So, its just 1 extra comparison comparing to 'classic' inline cache.

And, after thinking how inline cache is organized, now you can scratch
the first my paragraph related to  immediates!
We really don't need to discriminate between small integers/floats/rest!!
They could also be nothing more than just one of a range(s) defined in
our zoo of 'special' immediates!

So, at the end we will have just two kinds of references:
 - zero bit == 0 -- an object pointer
 - zero bit == 1 -- an immediate

Voila!.

We can have real zoo of immediates, and simple implementation to support them.
And not saying that range-table is provided by language-side, so we're
free to rearrange them at any moment.

And of course, it doesn't means that VM could not reserve some of the
ranges for own 'contracted'
immediates, like Characters, and even class reference for example.
Think about it :)


More information about the Vm-dev mailing list