On 8/4/07, <b class="gmail_sendername">Jason Johnson</b> <<a href="mailto:jason.johnson.081@gmail.com">jason.johnson.081@gmail.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>So what do you all think? Any big problems I didn't mention? Or solutions to the problems I mentioned? Or papers about this kind of thing (since my google-fu has not found too much on the subject so far)?
</blockquote><div><br>There's a trade-off here: either small-integer arithmetic<br>is made a little slower, or access to all objects is made<br>a little slower. Nothing's free (unless an entirely different<br>scheme is proposed).
<br><br>The reason for the current system is that it was built to<br>run on the Motorola M68000 series of CPUs which were<br>used in Apple's machines of the time. These CPUs, like<br>most today, required aligned access, meaning that to
<br>read or write a 4-byte integer value in memory, it has to<br>be stored at an address that is a multiple of 4.<br><br>Actually, I seem to remember that the M68k required an<br>alignment of only 2 bytes, but I could be wrong. Anyway.
<br>Suffice to say that to access an n-byte sized integer or<br>floating value on modern hardware, it needs to be stored<br>at a modulo-n address. The point is that all modern chips<br>require alignment, with the exception of the x86 series,
<br>which copes with mis-aligned data but is considerably<br>slower than if it were aligned (something like 10 times,<br>ISTR). Check out<br><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">http://en.wikipedia.org/wiki/Data_structure_alignment
</a><br></div><br>So for a 32-bit machine, we want to store pointer values<br>(OOPs) or numeric values in registers and memory.<br>Now, the hardware makes it a requirement for the lowest<br>2 bits to be zero for pointers to aligned (32-bit) data.
<br>So already, pointers are "tagged" by the hardware: any<br>value with the bottom bit set is by definition not a valid<br>pointer. The authors of squeak spotted this property<br>and came up with the current scheme.
<br><br>There's another point to keep in mind with your scheme:<br>what about address space? On my SPARC box,<br>pointers can be smallish values:<br><br>apg@breakfast: ~ cat t.c <br>
#include <stdio.h><br>
#include <stdlib.h><br>
int main( int argc, char * argv[] )<br>
{<br>
char * p = (char*) malloc( 23 );<br>
printf( "Address is %p\n", p );<br>
}<br>
<br>
apg@breakfast: ~ gcc t.c<br>apg@breakfast: ~ ./a.out <br>Address is 20d60<br><br>However, I'm pretty sure that this number could just as<br>easily be enormous, but your scheme would limit it to<br>2G of address space. It would also require setting the
<br>top-most bit on OOP-creation, testing it on access,<br>and converting to a memory address. A test and shift<br>for integers sounds simpler and cheaper (to me anyway).<br><br>The current scheme can provide 31-bit integers as well
<br>as a 4GB address space for other objects, and at a<br>pretty low cost. And 4GB is already a bit of a limitation --<br>I was involved in a project which needed squeak to<br>work with 6+GB of objects. Anyone know how well
<br>the 64-bit Squeak works these days?<br><br>Andrew<br><br><br></div>