A few low-level Pentium II performance measurements

Markus Kohler markus_kohler at hp.com
Thu Feb 18 09:45:56 UTC 1999


Jan Bottorff wrote:
> 
> I took a quick look at the processor performance counters a bit with Squeak
> 2.3 running on a Pentium-II-266. Note that super care was not used to
> create exactly reproducable results. I just though people might be
> interested in ballpark numbers. Running the highly official benchmark (not):
> 
> 100 timesRepeat:[0 tinyBenchmarks]
> 
> Average processor clock cycles/instruction retired = 1.10 (0.98)
> Average instructions retired/branch instructions retired = 5.33 (8.59)
> Average branch table buffer misses / branch instructions retired  = 0.43
> (0.20)
> Average data memory references / instructions retired = 0.57 (0.50)
> Average instructions retired / instructions decoded = 0.61 (0.87)
> 
> A wordy description when running Smalltalk code is: the Pentium II runs at
> a bit under 1 useful instruction per clock cycle, with about every 5
> instructions executed being a branch. Nearly half of these branches do not
> take advantage of the branch target buffer, incurring a performance
> penalty. Only about 2/3 of the instructions started give results that are
> kept.
> 
> I didn't offhand see any way to measure L1 cache hit ratio or TLB miss ratio.
> 
> The numbers in parenthesis are the measurements while running some very
> optimized processor intensive C code, Motion-JPEG video decompression
> specifically.
> 
> Some interesting observations seem to be:
> 
> 1) The instructions retired per clock isn't that much worse for the Squeak
> interpreter code.
> 
> 2) The branch target buffer logic works much better for C code.
> 
> 3) Both Squeak and C code can accesses memory a lot.
> 
> Because the C code seemed to not have much faster instruction/sec rates,
> yet seemed to throw away a lot fewer instructions (decoded to retired
> ratio), I believe there must have been significant slowdowns of the C code.
> Some measurements suggested the C code was in a processor "stalled on
> resources" 20% of the total clocks, but Squeak was only stalled 5% of the
> clocks. This may be an indirect indication of cache miss activity,
> especially since the C example code was pretty memory intensive. So this
> isn't exactly a great apples to apples comparison to C code. Still, even if
> the C code was stalled 0% of the time, it's retire rate would not be that
> much faster.
> 
> I also ran a dozen different programs and could not find a single one that
> achieved as high an instruction retire rate as the Squeak and video
> decompression tests. Other programs included, 3-D rendering (FP
> intensive?), and postscript to PDF conversion (Acrobat Distiller). This was
> very puzzling.
> 
> The performance of the test machine to '0 tinyBenchmarks' was '12195121
> bytecodes/sec; 671316 sends/sec'. As it's doing 1.1 clocks per machine
> instruction, this implies an average of 266,000,000 / 1.1 / 12195121 =
> 19.82 machine instructions per bytecode, which seems a bit high to me.
> Looking at the generated machine code (thanks to VTune disassembly) I count
> around 12 instructions for a really simple bytecode+dispatch, so mabey the
> data is correct. A hand crafted assembly interpreter could lose about half
> those instructions.
> 
> According to the comments on Integer tinyBenchmarks, a 292 MHz G3 Mac:
> 22727272 bytecodes/sec; 984169 sends/sec. This is quite a lot faster than
> the Pentium II, even adjusting for clock speed.

I agree. My HP PA-RISC C180 gets 11111111 bytecodes/sec at 180Mhz. Not
too bad
but the Mac is still faster per MHz. I guess one problem is the gnu
compiler
which does not optimize very well for PA-RISC. 

I will soon try the biggest processor on this planet a C360 with 1.5
Mbyte on chip
full speed cache ...

> 
> I'd be interested in seeing the G3 generated assembly fragments for
> bytecodes like "push constant 0" and the bytecode dispatching loop. It
> would be interesting to decide if the G3 compiler/instruction set generates
> better code or if the G3 processor is just much faster at executing similar
> code.
> 
> Hope this has been entertaining.
> 

I really like it :-)

Markus
-- 
Markus Kohler  mailto:markus_kohler at hp.com





More information about the Squeak-dev mailing list