Cache and Squeak Performance
Tim Olson
tim at jumpnet.com
Fri Feb 13 23:45:50 UTC 1998
>To go along with STP's explanation of "backside" cache, maybe Tim or
>someone could post or drop me a note explaning what N-way set
>associative is, benefit-wise.
Sure; description (with probably more detail than you want to know ;-)
follows:
Cache associativity defines how many unique places in the cache a
particular address can be stored. Consider a 32KB cache that caches
32-byte blocks (contiguous chunks of memory). The cache can hold up to
32K/32 = 1K cache blocks.
In a fully-associative cache, any memory block can be placed into any of
the 1024 cache blocks. A memory address is partitioned like:
27 bits 5 bits
+---------------+------+
| tag | byte |
+---------------+------+
where the byte specifies the byte-offset within a 32-byte cache block,
and the tag specifies the bits which are simultaneously compared with the
corresponding tag fields in each of the 1024 cache blocks to see if the
memory address is cached. New memory blocks are placed into empty cache
blocks until the cache is filled, in which case a cache block is chosen
for replacement, usually using some type of "not-recently-used"
algorithm. Fully-associative caches have the best general cache hit
ratio for a given cache size, but are usually slower than the other types
below, and also require the use of specialized Content-Addressable-Memory
(CAM) for the tags. Therefore they are rarely found in processors.
The other extreme is a direct-mapped cache, where any particular memory
block has only one corresponding cache block into which it can be placed.
The cache block is selected using a hash consisting of a subset of the
memory address bits, known as the index:
17 bits 10 bits 5 bits
+---------------+------+------+
| tag |index | byte |
+---------------+------+------+
The value of the index field uniquely specifies one of the 1024 cache
blocks. All memory addresses with the same index value will compete for
the same cache block, so two or more heavily-used memory blocks with the
same index will conflict and "thrash" the cache. External (L2) caches
are usually direct-mapped, because they are cheapest and easiest to
build, requiring only one special tag-ram and tag comparator.
A set-associative cache is in between a fully-associative cache and a
direct-mapped cache. It divides the cache blocks into M sets which are
indexed via the index bits. Within a set there are N possible places
(ways) where any particular memory block can be placed. For example, if
our 32KB cache is 8-way set-associative:
20 bits 7 bits 5 bits
+---------------+------+------+
| tag |index | byte |
+---------------+------+------+
we have a 7-bit index that select one of the 128 sets in the cache, which
in turn has 8 possible cache blocks that can be used, which reduces the
thrashing (conflict misses) that can occur in direct-mapped caches. The
PowerPC 750 has a 2-way set-associative L2, which is possible because all
the tag RAM and cache logic is on the processor die, rather than
external; only the actual cache data is stored in external SRAMs
(accessed via the dedicated backside bus).
How does all this tie into Squeak? Well, if you had an application with
fixed memory requirements, you could run it and determine if
heavily-accessed cache blocks conflict, then rearrange the memory map to
minimize conflicts with a direct-mapped cache. However, since Squeak
uses a generational, compacting garbage collector, memory addresses are
frequently shifting around, causing potential conflicts to show up at
random. Increasing the cache associativity helps minimize these
conflicts.
-- tim
More information about the Squeak-dev
mailing list
|