On 8/4/07, <b class="gmail_sendername">Jason Johnson</b> &lt;<a href="mailto:jason.johnson.081@gmail.com">jason.johnson.081@gmail.com</a>&gt; wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

So what do you all think?&nbsp; Any big problems I didn&#39;t mention?&nbsp; Or solutions to the problems I mentioned?&nbsp; Or papers about this kind of thing (since my google-fu has not found too much on the subject so far)?

</blockquote><div> There&#39;s a trade-off here:&nbsp; either small-integer arithmetic is made a little slower, or access to all objects is made a little slower.&nbsp; Nothing&#39;s free (unless an entirely different scheme is proposed).

<br><br>The reason for the current system is that it was built to<br>run on the Motorola M68000 series of CPUs which were<br>used in Apple&#39;s machines of the time.&nbsp; These CPUs, like<br>most today, required aligned access, meaning that to

read or write a 4-byte integer value in memory, it has to be stored at an address that is a multiple of 4. Actually, I seem to remember that the M68k required an alignment of only 2 bytes, but I could be wrong.&nbsp; Anyway.

<br>Suffice to say that to access an n-byte sized integer or<br>floating value on modern hardware, it needs to be stored<br>at a modulo-n address.&nbsp; The point is that all modern chips<br>require alignment, with the exception of the x86 series,

<br>which copes with mis-aligned data but is considerably<br>slower than if it were aligned (something like 10 times,<br>ISTR).&nbsp; Check out<br><a href="http://en.wikipedia.org/wiki/Data_structure_alignment">http://en.wikipedia.org/wiki/Data_structure_alignment

</a> </div> So for a 32-bit machine, we want to store pointer values (OOPs) or numeric values in registers and memory. Now, the hardware makes it a requirement for the lowest 2 bits to be zero for pointers to aligned (32-bit) data.

So already, pointers are &quot;tagged&quot; by the hardware: any value with the bottom bit set is by definition not a valid pointer.&nbsp; The authors of squeak spotted this property and came up with the current scheme.

<br><br>There&#39;s another point to keep in mind with your scheme:<br>what about address space?&nbsp; On my SPARC box,<br>pointers can be smallish values:<br><br>apg@breakfast: ~ cat t.c&nbsp;&nbsp;&nbsp;&nbsp; <br>

#include &lt;stdio.h&gt;<br>

#include &lt;stdlib.h&gt;<br>

int main( int argc, char * argv[] )<br>

{<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; char * p = (char*) malloc( 23 );<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf( &quot;Address is %p\n&quot;, p );<br>

}<br>

<br>

apg@breakfast: ~ gcc t.c<br>apg@breakfast: ~ ./a.out <br>Address is 20d60<br><br>However, I&#39;m pretty sure that this number could just as<br>easily be enormous, but your scheme would limit it to<br>2G of address space.&nbsp; It would also require setting the

<br>top-most bit on OOP-creation, testing it on access,<br>and converting to a memory address.&nbsp; A test and shift<br>for integers sounds simpler and cheaper (to me anyway).<br><br>The current scheme can provide 31-bit integers as well

<br>as a 4GB address space for other objects, and at a<br>pretty low cost.&nbsp; And 4GB is already a bit of a limitation --<br>I was involved in a project which needed squeak to<br>work with 6+GB of objects.&nbsp; Anyone know how well

<br>the 64-bit Squeak works these days?<br><br>Andrew<br><br><br></div>