On 20 February 2013 18:29, Eliot Miranda eliot.miranda@gmail.com wrote:
On Tue, Feb 19, 2013 at 11:10 PM, Camillo Bruni camillobruni@gmail.com wrote:
On 2013-02-20, at 01:25, Eliot Miranda eliot.miranda@gmail.com wrote:
On Tue, Feb 19, 2013 at 2:16 PM, Camillo Bruni camillobruni@gmail.com wrote:
The most annoying piece is Time machine and its disk access, I sometimes forget to suspend it, but it was off during the tinyBenchmark.
One simple approach is to run the benchmark three times and to discard the best and the worst results.
that is as good as taking the first one... if you want decent results measure >30 times and do the only scientific correct thing: avg + std deviation?
If the benchmark takes very little time to run and you're trying to avoid background effects then your approach won't necessarily work either.
true, but the deviation will most probably give you exactly that feedback. if you increase the runs but the quality of the result doesn't improve you know that you're dealing with some systematic error source.
This approach is simply more scientific and less home-brewed.
Of course, no argument here. But what's being discussed is using tinyBenchmarks as a quick smoke test. A proper CI system can be set it up for reliable results, but for IMO for a quick smoke test doing three runs manually is fine. IME, what tends to happen is that the first run is slow (caches heating up etc) and the second two runs are extremely close.
but not in case when you have an order(s) of magnitude speed degradation. This is too significant to be considered as measurement error or deviation. There should be something wrong with VM (cache always fails?).
-- best, Eliot