Cache and Squeak Performance

Tim Olson tim at jumpnet.com
Fri Feb 13 23:45:50 UTC 1998


>To go along with STP's explanation of "backside" cache, maybe Tim or
>someone could post or drop me a note explaning what N-way set
>associative is, benefit-wise.

Sure; description (with probably more detail than you want to know ;-) 
follows:


Cache associativity defines how many unique places in the cache a 
particular address can be stored.  Consider a 32KB cache that caches 
32-byte blocks (contiguous chunks of memory).  The cache can hold up to 
32K/32 = 1K cache blocks.

In a fully-associative cache, any memory block can be placed into any of 
the 1024 cache blocks.  A memory address is partitioned like:

          27 bits     5 bits
     +---------------+------+
     |     tag       | byte |
     +---------------+------+

where the byte specifies the byte-offset within a 32-byte cache block, 
and the tag specifies the bits which are simultaneously compared with the 
corresponding tag fields in each of the 1024 cache blocks to see if the 
memory address is cached.  New memory blocks are placed into empty cache 
blocks until the cache is filled, in which case a cache block is chosen 
for replacement, usually using some type of "not-recently-used" 
algorithm.  Fully-associative caches have the best general cache hit 
ratio for a given cache size, but are usually slower than the other types 
below, and also require the use of specialized Content-Addressable-Memory 
(CAM) for the tags.  Therefore they are rarely found in processors.


The other extreme is a direct-mapped cache, where any particular memory 
block has only one corresponding cache block into which it can be placed. 
 The cache block is selected using a hash consisting of a subset of the 
memory address bits, known as the index:

          17 bits     10 bits 5 bits
     +---------------+------+------+
     |     tag       |index | byte |
     +---------------+------+------+

The value of the index field uniquely specifies one of the 1024 cache 
blocks.  All memory addresses with the same index value will compete for 
the same cache block, so two or more heavily-used memory blocks with the 
same index will conflict and "thrash" the cache.  External (L2) caches 
are usually direct-mapped, because they are cheapest and easiest to 
build, requiring only one special tag-ram and tag comparator.


A set-associative cache is in between a fully-associative cache and a 
direct-mapped cache.  It divides the cache blocks into M sets which are 
indexed via the index bits.  Within a set there are N possible places 
(ways) where any particular memory block can be placed.  For example, if 
our 32KB cache is 8-way set-associative:

          20 bits     7 bits 5 bits
     +---------------+------+------+
     |     tag       |index | byte |
     +---------------+------+------+

we have a 7-bit index that select one of the 128 sets in the cache, which 
in turn has 8 possible cache blocks that can be used, which reduces the 
thrashing (conflict misses) that can occur in direct-mapped caches.  The 
PowerPC 750 has a 2-way set-associative L2, which is possible because all 
the tag RAM and cache logic is on the processor die, rather than 
external; only the actual cache data is stored in external SRAMs 
(accessed via the dedicated backside bus).


How does all this tie into Squeak?  Well, if you had an application with 
fixed memory requirements, you could run it and determine if 
heavily-accessed cache blocks conflict, then rearrange the memory map to 
minimize conflicts with a direct-mapped cache.  However, since Squeak 
uses a generational, compacting garbage collector, memory addresses are 
frequently shifting around, causing potential conflicts to show up at 
random.  Increasing the cache associativity helps minimize these 
conflicts.



     -- tim





More information about the Squeak-dev mailing list