cache in a VM

Sun Nov 25 07:07:18 UTC 2001

> >I'm really puzzled why you are working with such a VM; restricting to 16
> >bit OOPS is really very ancient technology. Is this some old version of
> >LittleSmalltalk or something? Why not use a more modern system; ie
> >Squeak ?

I while back I was toying with a VM that used 16-bit object pointers with a 
LOOM style translation to a much larger object space. The thought was 
perhaps the speed increase from having the active object working set always 
in L1/l2 processor cache would offset the overhead of periodically needing 
to shuffle objects between the small fast and large slower object spaces.

This thinking was right after doing a high speed networking project where 
performance REALLY was hurt by TLB misses, and the reality sunk in that 
that common systems can only do 5-10 million random memory accesses per 
second across a large memory space. This assumes a TLB miss, which requires 
4 memory bus clocks of access latency, and 3 more memory bus clocks to 
finish a cache line burst, and then after loading the TLB entry (which was 
a L2 cache miss), the same 7 clocks to get the actual memory location, or a 
total of 14 bus clocks for the processor to read 4 bytes. At a typical 
memory bus clock speed of 100 Mhz, this ends up being a usable memory 
bandwidth of only about 28 Mbytes/sec. There is a critical ratio between 
how many TLB entries you have, total memory space, and the randomness of 
access. As I remember, having like 256k of L2 cache with 256 MBytes of 
total memory was beginning to show this problem, basically when ALL the 
TLB's don't fit in L2 cache. Now that memory costs $75/Gigabyte, and L2 
caches are the same size as before, and processors are running at 2 GHz, 
this might be worth revisting.

This VM never got past the curious idea state, as I never got around to 
profiling a Smalltalk system to measure the randomness of memory accesses 
and the object working set size. I did have a name for the idea, which was 
a reduced address space architecture (RASA). It was just another variation 
of "make the common operations (access to objects in the object working 
set) go fast, in exchange for slowing down less common operations". It's 
possible I could now profile things using the Squeak VM simulator!

For a VM that has limited local resources (like a cheap microcontroller), 
and swaps objects across a network, I could imagine 16-bit object pointers 
with a LOOM virtual memory might be very attractive. Even a VM that has 
limited RAM but lots of slower flash (for most of the object space) might 
find this an attractive architecture.

In a perfect universe, all object pointers would be unique across the 
universe (128 bits is enough), and all that stuff we call files (local or 
remote) would just be referenced by some object pointer. Isn't this what 
Ted Nelson always wanted?

- Jan