There's memory bandwidth and there's memory transaction
tim at jumpnet.com
Wed Feb 10 02:02:01 UTC 1999
Jan Bottorff wrote:
[good stuff about memory latency and cache/tlb miss performance
>I suspect all processors with paged virtual memory have these issues. Some
>processors do have much larger caches (direct connection with processor
>price?). I also suspect the processor designers tend to run processor
>simulations of typical C/C++ programs, and it would be a real eye opener
>for them to see the access patterns of a Smalltalk system. Designers of 12
>pipeline stage processors (like the Pentium II) have obviously not
>optimized for execution environments that get a branch prediction miss
>every bytecode (flushing the execution pipeline every 5-10 instructions).
Back when I was designing PowerPC processors at Apple, we paid great
attention to the "ugly" code that made up much of the typical MacOS stuff
(including 68K emulation). We took multi-megabyte traces of Applications
and OS code, and analyzed them. This stuff had branches an average of 1
every 4-5 instructions, and deep pipelines w/ branch prediction didn't
help much. We designed the PowerPC 750 in response to this (short pipe
stages, good branch prediction with aggressive branch folding, etc).
Turns out that that stuff, along with a large, closely-coupled L2 cache
really helps Squeak out, as well.
More information about the Squeak-dev