[Vm-dev] 64 bits, boxing, and immediate objects (was: Float hierarchy for 64-bit Spur)

Fri Nov 21 12:55:04 UTC 2014

On 21.11.2014, at 08:23, Ben Coman <btc at openInWorld.com> wrote:

> a random thought on implementation of LargeDouble...
> is it worth considering a performance versus space tradeoff...
> rather than boxing LargeDouble so that access through a pointer reduces performance(?), could the value be stored in a second 64bits adjacent to the first? Or does being 128 bits in total complicate things too much?

That would require a 128-bit image. We are talking about a 64-bit image here. Please read the FAQ for how a 32/64 bit image relates to a 32/64 bit VM:

	http://squeakvm.org/squeak64/faq.html

Reading it is essential to not be confused about running on a 32-bit vs 64-bit computer, which is actually independent of the question at hand. But to understand what we are talking about, we need to go back to 32 bits first.

The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify an object. When you store a reference to an object into some other object, the VM actually stores a 32 bit word to some place in main memory.

When you use a Float in your code, the VM actually allocates 96 bits somewhere in memory (a 32-bit header for house keeping and 64 bits for the IEEE double) and gives you a 32-bit word back, which is a pointer to that object (we also call that an "oop"). This is called "boxing", it wraps the double inside an object. When you add two floats (say 3.0 + 4.0), the VM actually creates two objects and hands you back their oops (e.g. the two hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the object header), adds these two doubles, allocates another 96 bits in memory (say @56780000), and writes 64 bits of the result to the address 56780004. 

If this sounds expensive to you, that's because it is. It is even more expensive than that because we have just created 3*96 = 288 bits of garbage that needs to be cleaned up later, otherwise we would soon run out of memory if we keep allocating. Since everything in Smalltalk is an object, that is what the VM has to do.

But there is a trick. The VM uses it to avoid all this allocating and memory fetching for the most common operations, namely working with smallish integers, which are used everywhere.

That trick is to hide some data in the oop itself. In the 32 bits of object pointers, the lowest two bits are actually always 0, because objects are always allocated at addresses that are a multiple of 4 (32 bits = 4 bytes). If these are always 0, we don't actually need to store them. But since there is no good way to store just 30 bits, we can also use those two bits for something else.

And we do. The VM currently just uses one bit, the least significant bit (LSB). If the LSB is 0, this is a regular pointer to an object in main memory. If the LSB is 1, then the VM uses the other 31 bits to store an integer. Inside the oop itself, not at some place in memory! It does not need to be allocated, or garbage-collected. It's just there, hidden inside the 32-bit oop.

This makes operations on these "small integers" extremely efficient. To add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them, and shifts it back, sets the LSB, and answers @0000000F. All this happens in CPU registers, no memory access needed, which is why this is so fast. Access to main memory is orders of magnitude slower than register access.

We call that an "immediate object". The Squeak VM currently uses only one kind of immediate objects, although there could be more, since we still have an unused bit. It would be great to speed up floating point operations, too. But there is no way to hide a 64-bit double in a 32 bit oop.

Which brings us to the proposed 64-bit object format. Objects are allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples of 8, leaving the the 3 lowest bits for identifying immediate objects.

But there still is no way to hide a 64-bit double inside a 64-bit oop, because the VM needs at least 1 bit to distinguish between regular object pointers and immediate objects.

So Eliot is proposing a 61-bit immediate Float which (just like SmallIntegers) the VM can process using register operations only. This will be a major boost for most floating point operations (as long as your values are not larger than 10^38).

Coming back to your question: Yes, going to 128 bits would complicate things considerably ;)

- Bert -

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4142 bytes
Desc: not available
Url : http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20141121/0553754c/smime.bin