cache in a VM
Jan Bottorff
janb at pmatrix.com
Sun Nov 25 07:07:18 UTC 2001
> >I'm really puzzled why you are working with such a VM; restricting to 16
> >bit OOPS is really very ancient technology. Is this some old version of
> >LittleSmalltalk or something? Why not use a more modern system; ie
> >Squeak ?
I while back I was toying with a VM that used 16-bit object pointers with a
LOOM style translation to a much larger object space. The thought was
perhaps the speed increase from having the active object working set always
in L1/l2 processor cache would offset the overhead of periodically needing
to shuffle objects between the small fast and large slower object spaces.
This thinking was right after doing a high speed networking project where
performance REALLY was hurt by TLB misses, and the reality sunk in that
that common systems can only do 5-10 million random memory accesses per
second across a large memory space. This assumes a TLB miss, which requires
4 memory bus clocks of access latency, and 3 more memory bus clocks to
finish a cache line burst, and then after loading the TLB entry (which was
a L2 cache miss), the same 7 clocks to get the actual memory location, or a
total of 14 bus clocks for the processor to read 4 bytes. At a typical
memory bus clock speed of 100 Mhz, this ends up being a usable memory
bandwidth of only about 28 Mbytes/sec. There is a critical ratio between
how many TLB entries you have, total memory space, and the randomness of
access. As I remember, having like 256k of L2 cache with 256 MBytes of
total memory was beginning to show this problem, basically when ALL the
TLB's don't fit in L2 cache. Now that memory costs $75/Gigabyte, and L2
caches are the same size as before, and processors are running at 2 GHz,
this might be worth revisting.
This VM never got past the curious idea state, as I never got around to
profiling a Smalltalk system to measure the randomness of memory accesses
and the object working set size. I did have a name for the idea, which was
a reduced address space architecture (RASA). It was just another variation
of "make the common operations (access to objects in the object working
set) go fast, in exchange for slowing down less common operations". It's
possible I could now profile things using the Squeak VM simulator!
For a VM that has limited local resources (like a cheap microcontroller),
and swaps objects across a network, I could imagine 16-bit object pointers
with a LOOM virtual memory might be very attractive. Even a VM that has
limited RAM but lots of slower flash (for most of the object space) might
find this an attractive architecture.
In a perfect universe, all object pointers would be unique across the
universe (128 bits is enough), and all that stuff we call files (local or
remote) would just be referenced by some object pointer. Isn't this what
Ted Nelson always wanted?
- Jan
More information about the Squeak-dev
mailing list
|