Tagging

Andrew Gaylard ag at computer.org
Sat Aug 4 09:52:46 UTC 2007


On 8/4/07, Jason Johnson <jason.johnson.081 at gmail.com> wrote:
>
>
> So what do you all think?  Any big problems I didn't mention?  Or
> solutions to the problems I mentioned?  Or papers about this kind of thing
> (since my google-fu has not found too much on the subject so far)?


There's a trade-off here:  either small-integer arithmetic
is made a little slower, or access to all objects is made
a little slower.  Nothing's free (unless an entirely different
scheme is proposed).

The reason for the current system is that it was built to
run on the Motorola M68000 series of CPUs which were
used in Apple's machines of the time.  These CPUs, like
most today, required aligned access, meaning that to
read or write a 4-byte integer value in memory, it has to
be stored at an address that is a multiple of 4.

Actually, I seem to remember that the M68k required an
alignment of only 2 bytes, but I could be wrong.  Anyway.
Suffice to say that to access an n-byte sized integer or
floating value on modern hardware, it needs to be stored
at a modulo-n address.  The point is that all modern chips
require alignment, with the exception of the x86 series,
which copes with mis-aligned data but is considerably
slower than if it were aligned (something like 10 times,
ISTR).  Check out
http://en.wikipedia.org/wiki/Data_structure_alignment

So for a 32-bit machine, we want to store pointer values
(OOPs) or numeric values in registers and memory.
Now, the hardware makes it a requirement for the lowest
2 bits to be zero for pointers to aligned (32-bit) data.
So already, pointers are "tagged" by the hardware: any
value with the bottom bit set is by definition not a valid
pointer.  The authors of squeak spotted this property
and came up with the current scheme.

There's another point to keep in mind with your scheme:
what about address space?  On my SPARC box,
pointers can be smallish values:

apg at breakfast: ~ cat t.c
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char * argv[] )
{
        char * p = (char*) malloc( 23 );
        printf( "Address is %p\n", p );
}

apg at breakfast: ~ gcc t.c
apg at breakfast: ~ ./a.out
Address is 20d60

However, I'm pretty sure that this number could just as
easily be enormous, but your scheme would limit it to
2G of address space.  It would also require setting the
top-most bit on OOP-creation, testing it on access,
and converting to a memory address.  A test and shift
for integers sounds simpler and cheaper (to me anyway).

The current scheme can provide 31-bit integers as well
as a 4GB address space for other objects, and at a
pretty low cost.  And 4GB is already a bit of a limitation --
I was involved in a project which needed squeak to
work with 6+GB of objects.  Anyone know how well
the 64-bit Squeak works these days?

Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/exupery/attachments/20070804/ea085f9a/attachment.htm


More information about the Exupery mailing list