[squeak-dev] Using high res clock in benchmarks

Levente Uzonyi leves at caesar.elte.hu
Thu Apr 25 23:15:12 UTC 2019


70ns difference can easily come from cache misses. I think you'd get 
more consistent results if you compiled a method and ran that instead of 
compiling a new block for each execution.

The first two numbers are larger because the jit kicks in. IIRC no jit for 
the first run, jitting during the second run. All futher executions use 
the jitted code.

Also, the "hickups" probably happen because your OS is running other 
processes as well.

IIRC the highResClock primitive uses TSC on Intel CPUs, so it may not be 
as reliable as you'd expect it to be[1], especially on older CPUs (clock 
rate depends on CPU frequency, cores' clocks are not synced).

Levente

[1] https://en.wikipedia.org/wiki/Time_Stamp_Counter

On Fri, 26 Apr 2019, Nicolas Cellier wrote:

> Huh, in fact it's a bit different...
> 
> Le jeu. 25 avr. 2019 à 23:28, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> a écrit :
> 
> 
> Le jeu. 25 avr. 2019 à 23:22, Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> a écrit :
>       Hi,
> I recently published a Chronology-Core version for using high resolution clock.
> 
> On my 2.7GHz core i5 MBP (2015) I get this:
> 
> Time highResClockTicksPerMillisecond
> 
>  2699824
> => OK, consistent with 2.7GHz
> 
> Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond.
> 
>  -578 -563
> => Huh, invoking a primitive takes that long (500 to 600 nanoseconds)
> 
> 
> 
> (1 to: 200) collect: [:i | Time highResClock - Time highResClock * 1000000 // Time highResClockTicksPerMillisecond].
> 
>  #(-528 -96 -71 -61 -63 -61 -61 -61 -63 -61 -58 -61 -61 -61 -61 -61 -58 -58 -61 -57 -61 -58 -63 -61 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -63 -61 -61 -61 -61 -61 -61 -61 -61 -61 -64 -58
> -63 -61 -58 -61 -58 -61 -61 -61 -57 -61 -61 -58 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -65 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -58 -63 -61 -61 -61 -61 -61 -61
> -63 -61 -58 -61 -57 -61 -58 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -57 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -58 -61 -61 -61 -61 -61 -61 -61 -61 -61 -61 -63 -61 -63 -61 -58 -61 -57 -61 -61
> -61 -61 -58 -61 -58 -63 -57 -61 -58 -61 -61 -58 -63 -58 -63 -58 -58 -61 -61 -61 -61 -63 -61 -58 -61 -58 -61 -61 -61 -61 -57 -61 -58)
> In fact it's 500ns at first eval (JIT or something), then it's only 60ns.
> 
>
>             But I can correct it. Let's try it:
> 
> [10 factorial] bench.
> 
>  '14,000,000 per second. 71.2 microseconds per run.'
> => this is the reference result
> 
> Huh, I messed up with printing the timing (because I modified benchFor: using µsec clock)
> 
> [10 factorial] bench.
>  '14,500,000 per second. 69 nanoseconds per run.'
> 
> [] bench.
>  '191,000,000 per second. 5.25 nanoseconds per run.'
> 
> So this is including 5ns for evaluating block + incrementing counter + looping...
> 
> tmp := [].
> (1 to: 50) collect: [:i || ticks |
> ticks := Time highResClock.
> tmp value.
> Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond].
> 
>  #(1020 54 14 11 4 4 4 4 2 5 1 2 4 4 2 4 5 2 4 2 2 5 2 2 2 4 4 4 5 4 4 4 4 2 5 4 4 4 2 0 5 2 5 4 5 5 4 4 4 4)
> => it's about 5ns but here we are in the noise of correction (it varies between 57 and 63ns as we saw above)
> That's the limit os single evaluation with µ-bench
> 
> | tmp |
> tmp := [10 factorial].
> (1 to: 200) collect: [:i || ticks |
> ticks := Time highResClock.
> tmp value.
> Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond].
> 
> | tmp |
> tmp := [10 factorial].
> (1 to: 100) collect: [:i || ticks |
> ticks := Time highResClock.
> tmp value.
> Time highResClock - ticks + (Time highResClock - Time highResClock) * 1000000 // Time highResClockTicksPerMillisecond].
> 
>  #(3227 267 212 161 130 125 130 127 127 126 127 126 125 133 124 130 125 128 125 130 127 125 126 127 128 127 130 127 130 127 130 124 125 124 125 126 124 126 124 131 127 132 125 130 124 130 124 130 126 124 128 126 130 126 130
> 127 131 126 130 125 124 125 124 127 125 127 125 130 127 133 124 130 125 128 127 125 122 130 124 127 126 124 128 124 130 123 130 124 130 126 130 124 127 128 127 128 124 130 125 130)
>  #(1663 208 97 94 96 118 97 94 94 94 96 94 93 93 94 93 94 93 93 94 94 93 93 96 93 94 93 94 91 94 93 93 94 93 94 94 93 93 96 94 93 93 94 91 93 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 94 93 94 94 93 93 96 94 93 93 94 91 93
> 93 94 93 93 94 94 93 93 94 93 94 91 93 91 77 77 93 93 77 80 94 80 94 93 93 94 91)
> 
> two regimes again... and not the 70ns of bench this time...
> 
> 
> (1 to: 10) collect: [:i |
>     | ticks |
>     ticks := Time highResClock.
>     10 factorial.
>     Time highResClock - ticks +
> 
> sorry for the extra +, editing code in mail and not doing it is a recipe for not validating!
>           + (Time highResClock - Time highResClock) "correction"
>     * 1000000 // Time highResClockTicksPerMillisecond "get nanoseconds"].
> 
>  #(1309 247 88 84 74 69 71 71 71 69)
> => OK, first runs are a bit long, but we get 70ns per run as the reference
> 
>  #(1977 191 143 148 142 120 122 122 117 120)
> => Oups??? Second run gives different results???
> 
>  #(2239 180 143 143 142 116 117 117 117 114)
> => and third about the same than second...
> Any idea how to explain that?
> 
> 
> If I run 200 times, I sometimes get the two regimes separated by hickups
> 
>  #(1899 107 142 150 123 124 115 122 117 122 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 120 120 120 121 120 118 120 120 123 117 122 117 122 116 122 117 123
> 117 122 117 122 7950 128 117 122 120 118 9963 87 69 71 69 71 69 69 71 71 68 69 72 71 68 69 72 71 68 69 72 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69
> 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 72 71 68 69 72 71 68 69 69 71 69 71 69 71 68 69 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71 69 71
> 69 71 69 71)
> 
> 
>


More information about the Squeak-dev mailing list