[Vm-dev] Object format(s) as a contract with VM?

Igor Stasenko siguctua at gmail.com
Fri Jul 29 00:47:57 UTC 2011


Hello,

i thought about object formats and thought how to make them more
flexible and came to following idea.

Today's squeak and cog vms using a 4 bits in object header which
denoting an object format:

So, there is 16 possible object formats supported by VM:

formatOf: oop
"       0      no fields
        1      fixed fields only (all containing pointers)
        2      indexable fields only (all containing pointers)
        3      both fixed and indexable fields (all containing pointers)
        4      both fixed and indexable weak fields (all containing pointers).

        5      unused
        6      indexable word fields only (no pointers)
        7      indexable long (64-bit) fields (only in 64-bit images)

    8-11      indexable byte fields only (no pointers) (low 2 bits are
low 2 bits of size)
   12-15     compiled methods:
                   # of literal oops specified in method header,
                   followed by indexable bytes (same interpretation of
low 2 bits as above)
"

And virtual machine (mostly an ObjectMemory class) knows difference
between those numbers and how to deal with objects depending what
format they having.
The bad side of it, that most semantics around formats is hardcoded
and spreaded across various methods, like:

lastPointerOf: oop
	"Return the byte offset of the last pointer field of the given object.
	Works with CompiledMethods, as well as ordinary objects.
	Can be used even when the type bits are not correct."
	| fmt sz methodHeader header contextSize |
	<inline: true>
	<asmLabel: false>
	header := self baseHeader: oop.
	fmt := self formatOfHeader: header.
	fmt <= 4 ifTrue: [(fmt = 3 and: [self isContextHeader: header])
					ifTrue: ["contexts end at the stack pointer"
						contextSize := self fetchStackPointerOf: oop.
						^ CtxtTempFrameStart + contextSize * BytesPerWord].
				sz := self sizeBitsOfSafe: oop.
				^ sz - BaseHeaderSize  "all pointers"].
	fmt < 12 ifTrue: [^ 0]. "no pointers"

	"CompiledMethod: contains both pointers and bytes:"
	methodHeader := self longAt: oop + BaseHeaderSize.
	^ (methodHeader >> 10 bitAnd: 255) * BytesPerWord + BaseHeaderSize

See how smelly this code is?

What i was thinking about is , that what if number which denotes an
object format instead of just being a number, point to a table of
functions, which implement
a behavior, specific for that format.

Then the above method could be turned into something as short and nice
as following:

lastPointerOf: oop
	<inline: true>
	<asmLabel: false>
| table |
table := (self formatTableOf: oop).
table  performSomeOperation: oop param: param

which , when translated to C, will look like:

sqInt formatNumber;
formatNumber = fetchFormatOf(oop);

return ObjectFormats[formatNumber].performSomeOperation(oop, param).

So, the table is simple list of function pointers (which very closely
resembling a method dictionary in smalltalk), to provide an
implementation of certain behavior for given object format:

struct ObjectFormatFunctions {

  int  (*)(sqInt oop);  // firstFixedSlot
  int  (*)(sqInt oop);  // numFixedSlots:
  void* (*)(sqInt oop); // lastPointerOf:
  .. firstPointerOf:
  .. numIndexableSlots:
  .. indexableSlotSize:
   ...
  etc
)


so, then instead of hardcoding and spreading the smelly code around
the VM, we can write something like:

(self formatTableOf: oop) performSomeOperation: oop params: params

and code no longer needs to know if format = 0 or 4 or whatever, but
just dispatching to function using formats table.

Then all methods which dealing with object format could be placed in
nicely organized class structure and can be translated to C in form of
tables.

The downside of it, of course, that additional level of indirection
will cause a serious slowdown to interpreter.
Because C compiler don't sees directly the implementation of operation
for certain format and must call a function, pointer to which held in
a corresponding format table.
Moreover, you cannot inline those functions, since you don't know
which one will be used at concrete place because of dynamic dispatch.

How we can deal with it?
Well, we still can hardcode the stuff (granting that we are not
modifying object format table(s) at runtime), so we can write:

fmt = 5 ifTrue: [ ^ self inlineFunctionOfFormat: 5 oop: oop params: params ]

this requires some trickery in C code generator, but it is still doable.
So, in overall i think that drawbacks could be cleverly mitigated if
not completely avoided.

So, lets talk what opportinities it could give us:
  VM could support a dynamically changeable object format(s), if we
extend the number of possible formats to more than 16.
So, for first 16 numbers things will be hardcoded as today, but for
higher numbers, VM will always dispatch using format table.

The cost of indirection can be amortized quite well especially in
presence of JIT (it can simply inline the code for corresponding
semantics to access various bits in object
when compiling the methods of class with such format).

Then we could dynamically change (or create new formats) at run time.

But currently we have only 4 bits for object format. Where we can get some more?
A solution is extremely simple! We have a 5 bits in object header for
compact class index.
If we merge them with object format bits 4+5 = 9 -- 512 possible object formats!

The trick is that we still can pretty easily identify special objects
by reserving a concrete format number for them (hey we have plenty of
numbers)
So, then format numbers ranged from 0..15 will identify hardcoded
formats , known by VM from beginning.
And formats from 16..48 is for special objects.
and rest 512-48 is dynamically defined.

Of course, we could change the object header to look differently for
new VM, then there may be more (or less) than 9 bits for object
format.
Not really matters, because concept remains the same:
VM could know that the only field which it can safely access by itself
is an object header. For accessing other fields (if any) it should use
functions provided in object format table.

And the last thing: why i titled topic format as a contract?

Because the next logical step of it is to imprint formats in a kind of
form of manifest, and store them in image file.
Then VM, when booting an image will read manifest and translate it
into machine code before first attempt to access any object(s) in
object memory.


So, what you thinking about it?

It could be an overkill , but i like the idea that with such approach
we could change the object format(s) dynamically at run time, without
the need of changing VM.
It also structuring the code in VM quite nicely which will serve clarity.
It also opens a wide field for experiments with different object formats.


-- 
Best regards,
Igor Stasenko AKA sig.


More information about the Vm-dev mailing list