[Vm-dev] Speedcenter setup

Eliot Miranda eliot.miranda at gmail.com
Mon Oct 10 21:33:17 UTC 2016


Hi Tim,

On Mon, Oct 10, 2016 at 12:53 PM, Tim Felgentreff <timfelgentreff at gmail.com>
wrote:

>
> One issue we might want to discuss is the selection and sizing of the
> benchmarks.
>
> All benchmarks are autosized to run at least 600ms right now, and then
> those 600ms runs are repeated up to 100 times to obtain measurements. But
> 600ms might be too short for some benchmarks, if the GC only kicks in
> rarely for them, I don't know.
>

600ms is simply too short.  Back in the day when nfib was a popular
activation benchmark used to compare different languages (and it's still
useful for this today) the approach was to increase the argument to nfib
until the result took 30 seconds or more to run.  To get activations per
second one divided the result of nfib by the time taken.


>
> About the selection of benchmarks, the ToolInteraction went up, but that
> is a very high level benchmark, and might also be influenced heavily by
> refactorings in Morphic. So maybe the benchmark isn't all that useful, or I
> should stop tracking trunk with the benchmark images, and instead stay on
> the release.
>

The original Smalltalk-80 benchmark suite unclouded senders and
implementors whose performance depended not only on the implementation but
on the size of the image.  I modified the VisualWorks version to scale by
the number of methods used.  Necessary if one is to measure tool
performance, and measuring tool performance is useful in producing a
responsive system.  But one must be careful to measure something meaningful.

What do you think?
>

Categorising the benchmarks and describing them well is important.  Making
sure they vary by quality of implementation and not extraneous causes such
as code base size, is important.  Starting off all benchmarks for a
consistent state, and not running a set of benchmarks in the same image
(unless one "resets" by throwing away jutted code and GCing) is important.
Repeating each benchmark some number of times (e.g. three) and taking the
average or the middle, is important.  Running benchmarks on an otherwise
quiet machine is important.

And when we have Sista, running the benchmarks such that the system can
warm up is important.  We will want to see baseline and warm performance
compared.

On Mon, 10 Oct 2016 at 21:48 Tim Felgentreff <timfelgentreff at gmail.com>
> wrote:
>
>> Indeed, "up" is "slower", but I will think how to add some indication of
>> that the overview page nonetheless. I guess it isn't in there by default,
>> because the speed center website is agnostic to what kinds of benchmarks
>> you use, and writing it on each tiny chart may look messy. But I'll try it
>> out and see how it looks :)
>>
>> cheers,
>> Tim
>>
>>
>>
>> On Mon, 10 Oct 2016 at 20:26 Bert Freudenberg <bert at freudenbergs.de>
>> wrote:
>>
>> On Mon, Oct 10, 2016 at 7:40 PM, Eliot Miranda <eliot.miranda at gmail.com>
>> wrote:
>>
>> On Tue, Jul 26, 2016 at 11:56 PM, Tim Felgentreff <
>> timfelgentreff at gmail.com> wrote:
>>
>>
>>   http://speed.squeak.org/timeline/#/?exe=1,5&ben=grid&
>> env=2&revs=50&equid=on
>>
>>
>> Would it be possible to add so me indication to the page of what is
>> faster?  From the graphs I can't tell if higher is faster or longer.  There
>> are no labels :-(
>>
>>
>> If you click on one of the tiny charts you get a full chart with labels.
>> In all of the benchmarks, "up" means "more seconds" means "slower".
>>
>> - Bert -
>>
>>
>


-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20161010/aa8a26f1/attachment.htm


More information about the Vm-dev mailing list