[Vm-dev] Object format(s) as a contract with VM?

Mariano Martinez Peck marianopeck at gmail.com
Fri Jul 29 20:59:14 UTC 2011


On Fri, Jul 29, 2011 at 2:47 AM, Igor Stasenko <siguctua at gmail.com> wrote:

>
> Hello,
>
> i thought about object formats and thought how to make them more
> flexible and came to following idea.
>
> Today's squeak and cog vms using a 4 bits in object header which
> denoting an object format:
>
> So, there is 16 possible object formats supported by VM:
>
> formatOf: oop
> "       0      no fields
>        1      fixed fields only (all containing pointers)
>        2      indexable fields only (all containing pointers)
>        3      both fixed and indexable fields (all containing pointers)
>        4      both fixed and indexable weak fields (all containing
> pointers).
>
>        5      unused
>        6      indexable word fields only (no pointers)
>        7      indexable long (64-bit) fields (only in 64-bit images)
>
>    8-11      indexable byte fields only (no pointers) (low 2 bits are
> low 2 bits of size)
>   12-15     compiled methods:
>                   # of literal oops specified in method header,
>                   followed by indexable bytes (same interpretation of
> low 2 bits as above)
> "
>
> And virtual machine (mostly an ObjectMemory class) knows difference
> between those numbers and how to deal with objects depending what
> format they having.
> The bad side of it, that most semantics around formats is hardcoded
> and spreaded across various methods, like:
>
> lastPointerOf: oop
>        "Return the byte offset of the last pointer field of the given
> object.
>        Works with CompiledMethods, as well as ordinary objects.
>        Can be used even when the type bits are not correct."
>        | fmt sz methodHeader header contextSize |
>        <inline: true>
>        <asmLabel: false>
>        header := self baseHeader: oop.
>        fmt := self formatOfHeader: header.
>        fmt <= 4 ifTrue: [(fmt = 3 and: [self isContextHeader: header])
>                                        ifTrue: ["contexts end at the stack
> pointer"
>                                                contextSize := self
> fetchStackPointerOf: oop.
>                                                ^ CtxtTempFrameStart +
> contextSize * BytesPerWord].
>                                sz := self sizeBitsOfSafe: oop.
>                                ^ sz - BaseHeaderSize  "all pointers"].
>        fmt < 12 ifTrue: [^ 0]. "no pointers"
>
>        "CompiledMethod: contains both pointers and bytes:"
>        methodHeader := self longAt: oop + BaseHeaderSize.
>        ^ (methodHeader >> 10 bitAnd: 255) * BytesPerWord + BaseHeaderSize
>
> See how smelly this code is?
>
> What i was thinking about is , that what if number which denotes an
> object format instead of just being a number, point to a table of
> functions, which implement
> a behavior, specific for that format.
>
> Then the above method could be turned into something as short and nice
> as following:
>
> lastPointerOf: oop
>        <inline: true>
>        <asmLabel: false>
> | table |
> table := (self formatTableOf: oop).
> table  performSomeOperation: oop param: param
>
> which , when translated to C, will look like:
>
> sqInt formatNumber;
> formatNumber = fetchFormatOf(oop);
>
> return ObjectFormats[formatNumber].performSomeOperation(oop, param).
>
> So, the table is simple list of function pointers (which very closely
> resembling a method dictionary in smalltalk), to provide an
> implementation of certain behavior for given object format:
>
> struct ObjectFormatFunctions {
>
>  int  (*)(sqInt oop);  // firstFixedSlot
>  int  (*)(sqInt oop);  // numFixedSlots:
>  void* (*)(sqInt oop); // lastPointerOf:
>  .. firstPointerOf:
>  .. numIndexableSlots:
>  .. indexableSlotSize:
>   ...
>  etc
> )
>
>
> so, then instead of hardcoding and spreading the smelly code around
> the VM, we can write something like:
>
> (self formatTableOf: oop) performSomeOperation: oop params: params
>
> and code no longer needs to know if format = 0 or 4 or whatever, but
> just dispatching to function using formats table.
>

That's polymorphism in C :)

Igor: exactly that is what OpenDBX does. The thing is like this:  OpenDBX
has a "core". Such core is abstract, general and does not depend at all in
the backend. Then, there is a openDBX-mySQL.c, openDBX-postgresql.c,
etc.....each of those files are the glue/mapping between OpenDBX API and
database client library.  So....each of those files, implements a number of
functions (there are arround 20). Imagine:

In openDBX-mysql.c:
opendbx_init -> opendbx_init_mysql
opendbx_connect -> opendbx_connect_mysql
....

In openDBX-postgreSQL.c
opendbx_init -> opendbx_init_postgresl
opendbx_connect -> opendbx_connect_postgresql
....

The OpenDBX core needs to invoke some of those functions (such us
opendbx_init, opendbx_connect, etc) defined in the API between OpenDBX and
database client library. So....but the core works always the same way: there
is a table in memory that maps function names to real functions. So OpenDBX
just searches the key in the table and invokes the functions that is in
there, no matter its name :)





>
> Then all methods which dealing with object format could be placed in
> nicely organized class structure and can be translated to C in form of
> tables.
>
> The downside of it, of course, that additional level of indirection
> will cause a serious slowdown to interpreter.
> Because C compiler don't sees directly the implementation of operation
> for certain format and must call a function, pointer to which held in
> a corresponding format table.
> Moreover, you cannot inline those functions, since you don't know
> which one will be used at concrete place because of dynamic dispatch.
>
> How we can deal with it?
> Well, we still can hardcode the stuff (granting that we are not
> modifying object format table(s) at runtime), so we can write:
>
> fmt = 5 ifTrue: [ ^ self inlineFunctionOfFormat: 5 oop: oop params: params
> ]
>
> this requires some trickery in C code generator, but it is still doable.
> So, in overall i think that drawbacks could be cleverly mitigated if
> not completely avoided.
>
> So, lets talk what opportinities it could give us:
>  VM could support a dynamically changeable object format(s), if we
> extend the number of possible formats to more than 16.
> So, for first 16 numbers things will be hardcoded as today, but for
> higher numbers, VM will always dispatch using format table.
>
> The cost of indirection can be amortized quite well especially in
> presence of JIT (it can simply inline the code for corresponding
> semantics to access various bits in object
> when compiling the methods of class with such format).
>
> Then we could dynamically change (or create new formats) at run time.
>
> But currently we have only 4 bits for object format. Where we can get some
> more?
> A solution is extremely simple! We have a 5 bits in object header for
> compact class index.
> If we merge them with object format bits 4+5 = 9 -- 512 possible object
> formats!
>
> The trick is that we still can pretty easily identify special objects
> by reserving a concrete format number for them (hey we have plenty of
> numbers)
> So, then format numbers ranged from 0..15 will identify hardcoded
> formats , known by VM from beginning.
> And formats from 16..48 is for special objects.
> and rest 512-48 is dynamically defined.
>
> Of course, we could change the object header to look differently for
> new VM, then there may be more (or less) than 9 bits for object
> format.
> Not really matters, because concept remains the same:
> VM could know that the only field which it can safely access by itself
> is an object header. For accessing other fields (if any) it should use
> functions provided in object format table.
>
> And the last thing: why i titled topic format as a contract?
>
> Because the next logical step of it is to imprint formats in a kind of
> form of manifest, and store them in image file.
> Then VM, when booting an image will read manifest and translate it
> into machine code before first attempt to access any object(s) in
> object memory.
>
>
> So, what you thinking about it?
>
> It could be an overkill , but i like the idea that with such approach
> we could change the object format(s) dynamically at run time, without
> the need of changing VM.
> It also structuring the code in VM quite nicely which will serve clarity.
> It also opens a wide field for experiments with different object formats.
>
>
> --
> Best regards,
> Igor Stasenko AKA sig.
>



-- 
Mariano
http://marianopeck.wordpress.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20110729/f91960c6/attachment.htm


More information about the Vm-dev mailing list