[Vm-dev] Direct object pointers vs indirect ones pros and cons

Fri Nov 12 17:46:39 UTC 2010

Igor,

those of us who design our own hardware have options that are not
available when using conventional processors. In the case of object
tables, we can use virtually addressed object caches (invented in the
Mushroom project - http://www.wolczko.com/mushroom/index.html) to
eliminate most of the cost.

In a conventional processor, think about what happens when we execute an
instruction like

load R3, R7, R1

where R1 has the number of the instance variable we want to read
(multiplied by the word size, depending on the processor), R7 is the oop
for the object and R3 will store the value of the instance variable. The
first step is that R7 and R1 are added and the result is the virtual
address of the instance variable. Then the top (20 or so) bits will be
searched in the TLB (translation look-aside buffer) of the MMU (memory
management unit) and, if found there, they will be replaced with the
associated bits, forming the physical address of the instance variable.
The last step is that the top bits of the physical address (28 bits in
the case of a cache with lines of 16 bytes) are used to find the right
line in the data cache and the bottom bits will select the bytes from
that line to be loaded into R3.

Of course, sometimes the "page" isn't in the TLB or the data cache
doesn't have the needed line, but let's not worry about that for now.

Imagine that we redesign our processor so that same instruction will
work like this: we concatenate R7 and R1 into a 64 bit virtual instance
variable address and use the top 60 bits to find the right line in the
data cache, and the bottom 4 bits to select the bytes from that line to
be loaded into R3. We have saved one addition and one MMU lookup at the
cost of a larger tag for the cache. An additional cost is that two
objects can't share the same cache line like they can in the
conventional processor, but that doesn't hurt much.

When we can't find the cache line we need, we have to bring in data from
the main memory. That can be done by adding R7 and R1, masking the
bottom 4 bits, doing the MMU lookup and fetching the 16 bytes from the
result. This will be compatible with the direct pointer Squeak. But we
could instead use R7 as an index into an object table, fetch the base
address, add R1 to that, mask the bottom 4 bits, do a MMU lookup (or not
- the object table itself could double as a virtual memory system) and
fetch the 16 bytes into the new cache line. Since cache misses are rare,
the extra memory access here does not impact performance very much.

Note that virtual caches are considered a bad thing in the C world
because of aliasing problems: two virtual addresses might map to the
same physical address and then you could have two copies of the same
data in the cache and no way to keep them consistent. With object
addressing, this is much easier to avoid.

-- Jecel