There's memory bandwidth and there's memory transaction thruput

Wed Feb 10 02:02:01 UTC 1999

Jan Bottorff wrote:

     [good stuff about memory latency and cache/tlb miss performance 
problems]

>I suspect all processors with paged virtual memory have these issues. Some
>processors do have much larger caches (direct connection with processor
>price?). I also suspect the processor designers tend to run processor
>simulations of typical C/C++ programs, and it would be a real eye opener
>for them to see the access patterns of a Smalltalk system. Designers of 12
>pipeline stage processors (like the Pentium II) have obviously not
>optimized for execution environments that get a branch prediction miss
>every bytecode (flushing the execution pipeline every 5-10 instructions).

Back when I was designing PowerPC processors at Apple, we paid great 
attention to the "ugly" code that made up much of the typical MacOS stuff 
(including 68K emulation).  We took multi-megabyte traces of Applications 
and OS code, and analyzed them.  This stuff had branches an average of 1 
every 4-5 instructions, and deep pipelines w/ branch prediction didn't 
help much.  We designed the PowerPC 750 in response to this (short pipe 
stages, good branch prediction with aggressive branch folding, etc).  
Turns out that that stuff, along with a large, closely-coupled L2 cache 
really helps Squeak out, as well.

     -- tim