[Vm-dev] A stab at explaining Spur

Stefan Marr smalltalk at stefan-marr.de
Fri Aug 14 12:59:24 UTC 2015

Hi Clément:

> On 14 Aug 2015, at 10:53, Clément Bera <bera.clement at gmail.com> wrote:
> In some implementations (e.g. Dart [22] and PyPy [3]), object header and attribute storage can be separated, so the attribute storage can be relocated in order to grow. 
> [3]C. F. Bolz. Efficiently implementing objects with maps, 2010. 
> [22]F. Schneider. Compiling dart to efficient machine code, 2012.

After a brief discussion with Carl Friedrich, this blog post is slightly outdated.
Each object is allocated with 5 fields. I think the last one can be used, if needed, to refer to an extra extension/storage array.
So, objects are not split in the sense you got in mind. But they can have two parts.
This is pretty common the Truffle Object Model uses the same strategy, as do I in my SOM implementations.
[For Smalltalks, the strategy is also nice, because it gives you fast, object-table like become, without any need for barriers.]

Main drawback of the approach is that objects have a rather large size. But so far that was neither for PyPy nor Truffle a real issue.

For details, see https://bitbucket.org/pypy/pypy/src/7089cedc919340a0c07f78cc952d5760146f7d86/pypy/objspace/std/mapdict.py?at=default#mapdict.py-537 (_make_subclass_size_n (only ever used for n==5))

A little more tangible:

> When I read F. Boltz. post, it looks like to me that in Pypy each instance of a class has a pointer to its map and its storage. The storage seems to be at a different location than the object's header and holds the instance variable values. To me it sounds very much like the object is 'split' to be able to grow the storage if a new instance variable is added for a specific instance. Accessing an object instance variable requires an extra indirection through the storage pointer. Is there something I miss there ? It looks like the paper we wrote with Eliot could definitely apply there in order to speed up instance variable access by removing the indirection to the storage.

So, this is only true for fields that don’t fit into the slots that are directly following the ‘header’.

> I am waiting for other people comments but to me it looks like the memory representation where the object’s header is separated from the the object's fields is used in Javascript V8 and Pypy, as explained in the 2 references, and that they could benefit from our implementation.

I don’t know about V8 for sure, but for PyPy (and Truffle, and SOM), I would not characterize it this way. The extension array is strictly optional for larger objects. From a conceptual point of view it is not designed as being split from the header. One could reduce the number of inlined fields to zero, and only keep the pointer to the storage array, but, this has a significant cost on my experience. (No, I don’t have numbers handy, but someone with an interest to easily do it by adjusting the min-max field number in PyPy).

Best regards

Stefan Marr
Johannes Kepler Universität Linz

More information about the Vm-dev mailing list