[Vm-dev] Re: [squeak-dev] Spur with Immediate Floating Point Support implies a break

Thu Dec 4 23:00:27 UTC 2014

On Thu, Dec 4, 2014 at 2:46 PM, Ben Coman <btc at openinworld.com> wrote:

> Bert Freudenberg wrote:
>
>> On 04.12.2014, at 04:18, Levente Uzonyi <leves at elte.hu> wrote:
>>>
>>> Hi Eliot,
>>>
>>> On Wed, 3 Dec 2014, Eliot Miranda wrote:
>>>
>>>  SmallFloat64 is an immediate tagged representation, like SmallInteger,
>>>> so
>>>> they fit within an object pointer and have no header.  In 64-bit Spur
>>>> there
>>>> is a 3-bit tag, leaving 61 bits.  SmallFoat64 steals 3 bits from the
>>>> 11-bit
>>>> exponent to donate to the tags, representing a full double precision
>>>> floating-point value that is restricted to the ~ +/-10^+/-38 range.
>>>> There's really no practical way to shoe-horn a usable range of 64-bit
>>>> float
>>>> into a 30-bit value.  Its possible but so few values would fit that the
>>>> effort would be counter-productive.  DOes this make sense now?
>>>>
>>> I didn't mean to use 30-bit values. I meant to use the same 61-bit
>>> representation as with the 64-bit Spur.
>>> The object header is 64 bits long in both 32-bit and 64-bit Spur, right?
>>> If yes, then why is it not possible to detect the tag of SmallFloat64 in
>>> a 32-bit VM, and treat the object as immediate?
>>>
>>
>> Because that is not what "immediate" means. There is no header, and not
>> even an object. The value is encoded in the oop itself. You can't fit 61
>> bits in a 32 bit oop.
>>
>> I explained this previously, but I'll paste again:
>>
>>  The Squeak VM (and Cog and Spur) traditionally use 32 bits to identify
>>> an object. When you store a reference to an object into some other object,
>>> the VM actually stores a 32 bit word to some place in main memory.
>>>
>>> When you use a Float in your code, the VM actually allocates 96 bits
>>> somewhere in memory (a 32-bit header for house keeping and 64 bits for the
>>> IEEE double) and gives you a 32-bit word back, which is a pointer to that
>>> object (we also call that an "oop"). This is called "boxing", it wraps the
>>> double inside an object. When you add two floats (say 3.0 + 4.0), the VM
>>> actually creates two objects and hands you back their oops (e.g. the two
>>> hexadecimal numbers @12345600 and @1ABCDE00). Then to add them, the VM
>>> reads 64 bits from the memory addresses 12345604 and 1ABCDE04 (skipping the
>>> object header), adds these two doubles, allocates another 96 bits in memory
>>> (say @56780000), and writes 64 bits of the result to the address 56780004.
>>> If this sounds expensive to you, that's because it is. It is even more
>>> expensive than that because we have just created 3*96 = 288 bits of garbage
>>> that needs to be cleaned up later, otherwise we would soon run out of
>>> memory if we keep allocating. Since everything in Smalltalk is an object,
>>> that is what the VM has to do.
>>>
>>> But there is a trick. The VM uses it to avoid all this allocating and
>>> memory fetching for the most common operations, namely working with
>>> smallish integers, which are used everywhere.
>>>
>>> That trick is to hide some data in the oop itself. In the 32 bits of
>>> object pointers, the lowest two bits are actually always 0, because objects
>>> are always allocated at addresses that are a multiple of 4 (32 bits = 4
>>> bytes). If these are always 0, we don't actually need to store them. But
>>> since there is no good way to store just 30 bits, we can also use those two
>>> bits for something else.
>>>
>>> And we do. The VM currently just uses one bit, the least significant bit
>>> (LSB). If the LSB is 0, this is a regular pointer to an object in main
>>> memory. If the LSB is 1, then the VM uses the other 31 bits to store an
>>> integer. Inside the oop itself, not at some place in memory! It does not
>>> need to be allocated, or garbage-collected. It's just there, hidden inside
>>> the 32-bit oop.
>>>
>>> This makes operations on these "small integers" extremely efficient. To
>>> add e.g. 3 and 4, the VM gets the oops @00000007 and @00000009, shifts them
>>> 1 bit to get the actual integers (7 >> 1 = 3 and 9 >> 1 = 4), adds them,
>>> and shifts it back, sets the LSB, and answers @0000000F. All this happens
>>> in CPU registers, no memory access needed, which is why this is so fast.
>>> Access to main memory is orders of magnitude slower than register access.
>>>
>>> We call that an "immediate object". The Squeak VM currently uses only
>>> one kind of immediate objects, although there could be more, since we still
>>> have an unused bit. It would be great to speed up floating point
>>> operations, too. But there is no way to hide a 64-bit double in a 32 bit
>>> oop.
>>>
>>> Which brings us to the proposed 64-bit object format. Objects are
>>> allocated in chunks of 64 bits = 8 bytes, meaning addresses are multiples
>>> of 8, leaving the the 3 lowest bits for identifying immediate objects.
>>>
>>> But there still is no way to hide a 64-bit double inside a 64-bit oop,
>>> because the VM needs at least 1 bit to distinguish between regular object
>>> pointers and immediate objects.
>>>
>>> So Eliot is proposing a 61-bit immediate Float which (just like
>>> SmallIntegers) the VM can process using register operations only. This will
>>> be a major boost for most floating point operations (as long as your values
>>> are not larger than 10^38).
>>>
>>
> btw, I forgot to say so when you last wrote that, it was enlightening -
> thanks for taking the time to write it.
>

If the information was in a class comment somewhere would you have found it
and read it?

> cheers -ben
>
>

-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20141204/5aacc45a/attachment.htm