A Benchmarking tool for the trunk?

List overview All Threads
Download

newer

older

Re: The Trunk: System-dtl.821.mcz

The Trunk: Morphic-mt.1127.mcz

Tim Felgentreff

27 Apr 2016 27 Apr '16

9:37 p.m.

Hi,

We have a proposal for a tool that we think might be useful to have in trunk.

We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.

Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg

As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference.

The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.

cheers, Tim

Attachments:

attachment.html (text/html — 2.5 KB)

Show replies by date

Eliot Miranda

28 Apr 28 Apr

12:44 a.m.

Hi Tim,

...

On Apr 27, 2016, at 12:37 PM, Tim Felgentreff timfelgentreff@gmail.com wrote:

Hi,

We have a proposal for a tool that we think might be useful to have in trunk.

We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.

Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg

As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference.

IMO 600ms is about 500 times too short ;-). Is this parameterised?

...

The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.

That would be fabulous. Hence - can it be controlled from the command line? - is it portable to Pharo?

...

cheers, Tim

Tim Felgentreff

1:01 p.m.

Hi Eliot,

On 28 April 2016 at 00:44, Eliot Miranda eliot.miranda@gmail.com wrote:

...

IMO 600ms is about 500 times too short ;-). Is this parameterised?

This is parameterised, but let me elaborate on why I think this is ok. There are three levels to each benchmark:

1. Problem size. This depends on the benchmark, e.g. a linear progression in BinaryTrees will give you exponentially growing runtime. This should be chosen by the benchmark implementer to be reasonably small while still providing reasonable results. For example, the fibonacci benchmark shouldn't set it's problem size only to 4 or 5, because e.g. on RSqueak the JIT could simply unroll this completely and generate machine code that just checks that the relevant method dicts haven't changed and return the constant. So the problem size should be large enough for that.

2. Autosize iterations These are dynamically chosen per machine to execute a benchmark with a fixed problem size repeatedly to average any noise from e.g. OS-level scheduling and such. I think this is usually fine with 500-1000ms, because OS-level interruptions are then distributed fairly evenly. The autosizer simply finds a small number of iterations to run the inner benchmarks and then averages the runs to get rid of the small noise.

3. Benchmark iterations This is something that you choose when you use the tool, to actually do enough runs to warmup the JIT and such. With an autosize time of about 600ms, I usually choose 100 Benchmark iterations, so that the overall benchmarking time the benchmark runs will be about 60 seconds. In the tool, we measure times and GC stats between these iterations to get the bar chart.

All three of these levels are configurable, but the UI just asks you for the third.

...

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g. - BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'. 'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite - BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'. 'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup ...

Output is printed to stdout.

...

is it portable to Pharo?

I don't see why it shouldn't work immediately, unless they don't have the tool builder anymore. Might be that they removed the Postscript Canvas, then you cannot export your benchmarks as easily.

...

cheers, Tim

Stefan Marr

2:19 p.m.

Hi Tim:

...

On 28 Apr 2016, at 13:01, Tim Felgentreff timfelgentreff@gmail.com wrote:

...

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter] <suiteOrBenchmark> [iterations [processes [problemSize]]]

Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/

Eliot Miranda

3:52 p.m.

Hi Tim,

the below is lovely and makes it easy to run from the command line. Please can we keep it? The Mac VM's command line support is broken (but being fixed) so test in Windows using a console VM and/or in Linux.

One more request, it would be great if the package was load able and runnable in VW in some form so that at least one can gather a complete baseline set of results from VW.

_,,,^..^,,,_ (phone)

...

On Apr 28, 2016, at 5:19 AM, Stefan Marr smalltalk@stefan-marr.de wrote:

Hi Tim:

...
...
On 28 Apr 2016, at 13:01, Tim Felgentreff timfelgentreff@gmail.com wrote:

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter] <suiteOrBenchmark> [iterations [processes [problemSize]]]

Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/

timfelgentreff

29 Apr 29 Apr

12:10 p.m.

Hi Stefan,

what does your squeak-vm.sh script do? Because on Squeak, I cannot simply type --help and get output. The mailing list thread you linked refers to something Pharo specific that I don't think we have in Squeak.

Stefan Marr-3 wrote

...

Hi Tim:

...
On 28 Apr 2016, at 13:01, Tim Felgentreff <

...

timfelgentreff@

...

> wrote:

...
...

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter]

<suiteOrBenchmark>
                           [iterations [processes [problemSize]]] 
Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/

-- View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.ht... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Stefan Marr

3:45 p.m.

Hi Tim:

...

On 29 Apr 2016, at 12:10, timfelgentreff timfelgentreff@gmail.com wrote:

what does your squeak-vm.sh script do? Because on Squeak, I cannot simply type --help and get output. The mailing list thread you linked refers to something Pharo specific that I don’t think we have in Squeak.

At least in Pharo there was/is a way to register a handler for the startup. SMark used to do that. It then will process the command line arguments.

I don’t remember the details, sorry, and currently don’t have access to the code to check.

Best regards Stefan

...

Stefan Marr-3 wrote

...
Hi Tim:

...
On 28 Apr 2016, at 13:01, Tim Felgentreff <

...
timfelgentreff@

...
> wrote:

...
...

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter]

<suiteOrBenchmark>
                         [iterations [processes [problemSize]]] 
Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/
-- View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.ht... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Chris Muller

30 Apr 30 Apr

6:12 p.m.

Squeak always processes command-line arguments. If the #readDocumentAtStartup Preference in the image is set (the default), it will treat the first image argument as a URL referring to a Smalltalk script to execute, and the subsequent ones as arguments to that script:

squeak -vm [vmArgs] myImage.image [urlToSmalltalkScript] [scriptArg1 scriptArg2 ...]

There's a convenience method provides easy access to those arguments and basic error handling for headless running via

"This code goes in a text file and referred to by the urlToSmalltalkScript" Smalltalk run: [ :scriptArg1 :scriptArg2 | "... your script..." ]

If readDocumentAtStartup is not set, then each image argument is simply passed in as an Array of Strings.

squeak -vm [vmArgs] myImage.image [imageArg1 imageArg2 imageArg3 ...]

On Fri, Apr 29, 2016 at 8:45 AM, Stefan Marr smalltalk@stefan-marr.de wrote:

...

Hi Tim:

...
On 29 Apr 2016, at 12:10, timfelgentreff timfelgentreff@gmail.com wrote:

what does your squeak-vm.sh script do? Because on Squeak, I cannot simply type --help and get output. The mailing list thread you linked refers to something Pharo specific that I don’t think we have in Squeak.

At least in Pharo there was/is a way to register a handler for the startup. SMark used to do that. It then will process the command line arguments.

I don’t remember the details, sorry, and currently don’t have access to the code to check.

Best regards Stefan

...
Stefan Marr-3 wrote

...
Hi Tim:

...
On 28 Apr 2016, at 13:01, Tim Felgentreff <

...
timfelgentreff@

...
> wrote:

...
...

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter]

<suiteOrBenchmark>
                         [iterations [processes [problemSize]]]
Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/
-- View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.ht... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Tim Felgentreff

1 May 1 May

12:24 a.m.

Hi Chris and Stefan

yes, the Squeak cmdline arg processing through a file is what I'm using, but Pharo supports different ( more 'traditional looking') stuff afaict. If there was code specific to the Pharo way of doing it, I'm SMark, I don't know about it, since I haven't used Pharo for a few years (and RSqueak doesn't work with it, because they removed some of the fallback code for primitives that we don't implement).

I can look at the code, but I would be against a command line interface that doesn't work the same across different Smalltalk distributions. But we can certainly think about how to improve it.

cheers, Tim

On 30 April 2016 at 18:12, Chris Muller asqueaker@gmail.com wrote:

...

Squeak always processes command-line arguments. If the #readDocumentAtStartup Preference in the image is set (the default), it will treat the first image argument as a URL referring to a Smalltalk script to execute, and the subsequent ones as arguments to that script:

squeak -vm [vmArgs] myImage.image [urlToSmalltalkScript] [scriptArg1 scriptArg2 ...]

There's a convenience method provides easy access to those arguments and basic error handling for headless running via

"This code goes in a text file and referred to by the urlToSmalltalkScript" Smalltalk run: [ :scriptArg1 :scriptArg2 | "... your script..." ]

If readDocumentAtStartup is not set, then each image argument is simply passed in as an Array of Strings.

squeak -vm [vmArgs] myImage.image [imageArg1 imageArg2 imageArg3 ...]

On Fri, Apr 29, 2016 at 8:45 AM, Stefan Marr smalltalk@stefan-marr.de

wrote:

...

...
Hi Tim:

...
On 29 Apr 2016, at 12:10, timfelgentreff timfelgentreff@gmail.com

wrote:

...

...
...
what does your squeak-vm.sh script do? Because on Squeak, I cannot

simply

...

...
...
type --help and get output. The mailing list thread you linked refers to something Pharo specific that I don’t think we have in Squeak.

At least in Pharo there was/is a way to register a handler for the

startup.

...

...
SMark used to do that. It then will process the command line arguments.

I don’t remember the details, sorry, and currently don’t have access to

the code to check.

...

...
Best regards Stefan

...
Stefan Marr-3 wrote

...
Hi Tim:

...
On 28 Apr 2016, at 13:01, Tim Felgentreff <

...
timfelgentreff@

...
> wrote:

...
...

can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See:

http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

...

...
...
...
$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter]

<suiteOrBenchmark>

[iterations [processes [problemSize]]]

Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/

-- View this message in context:

http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.ht...

...

...
...
Sent from the Squeak - Dev mailing list archive at Nabble.com.

Stefan Marr

2:13 p.m.

Hi Tim:

Ok, I checked what I did. See http://smalltalkhub.com/#!/~StefanMarr/SMark/packages/Scripting

This implements support for command-line scripting by registering a startup handler, which then calls SMarkHarness class>>#run:.

Last time I checked, this was compatible with Squeak and Pharo, because I was using it even with Squeak 3.9 images. All this infrastructure is coming out of the RoarVM project. So, mind you, the code is dating back a while…

The `ScriptStarter` class should be the code to look at. The #initialize/#install methods on the class side do the relevant setup.

Hope that helps Stefan

...

On 01 May 2016, at 00:24, Tim Felgentreff timfelgentreff@gmail.com wrote:

Hi Chris and Stefan

yes, the Squeak cmdline arg processing through a file is what I'm using, but Pharo supports different ( more 'traditional looking') stuff afaict. If there was code specific to the Pharo way of doing it, I'm SMark, I don't know about it, since I haven't used Pharo for a few years (and RSqueak doesn't work with it, because they removed some of the fallback code for primitives that we don't implement).

I can look at the code, but I would be against a command line interface that doesn't work the same across different Smalltalk distributions. But we can certainly think about how to improve it.

cheers, Tim

On 30 April 2016 at 18:12, Chris Muller asqueaker@gmail.com wrote:

...
Squeak always processes command-line arguments. If the #readDocumentAtStartup Preference in the image is set (the default), it will treat the first image argument as a URL referring to a Smalltalk script to execute, and the subsequent ones as arguments to that script:

squeak -vm [vmArgs] myImage.image [urlToSmalltalkScript] [scriptArg1 scriptArg2 ...]

There's a convenience method provides easy access to those arguments and basic error handling for headless running via

"This code goes in a text file and referred to by the urlToSmalltalkScript" Smalltalk run: [ :scriptArg1 :scriptArg2 | "... your script..." ]

If readDocumentAtStartup is not set, then each image argument is simply passed in as an Array of Strings.

squeak -vm [vmArgs] myImage.image [imageArg1 imageArg2 imageArg3 ...]

On Fri, Apr 29, 2016 at 8:45 AM, Stefan Marr smalltalk@stefan-marr.de wrote:

...
Hi Tim:

...
On 29 Apr 2016, at 12:10, timfelgentreff timfelgentreff@gmail.com wrote:

what does your squeak-vm.sh script do? Because on Squeak, I cannot simply type --help and get output. The mailing list thread you linked refers to something Pharo specific that I don’t think we have in Squeak.

At least in Pharo there was/is a way to register a handler for the startup. SMark used to do that. It then will process the command line arguments.

I don’t remember the details, sorry, and currently don’t have access to the code to check.

Best regards Stefan

...
Stefan Marr-3 wrote

...
Hi Tim:

...
On 28 Apr 2016, at 13:01, Tim Felgentreff <

...
timfelgentreff@

...
> wrote:

...
> - can it be controlled from the command line?

Yes, provided you mean "use a .st file argument". To run a benchmark you can write e.g.

BenchmarkAutosizeSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout'. 100}. # runs all shootout benchmarks for 100 outer iterations, reporting statistics in the autosize suite

BenchmarkCogSuite run: {'BenchmarkSimpleStatisticsReporter'.

'SMarkShootout.benchBinaryTrees'. 100}. # runs the binarytrees benchmarks for 100 outer iterations without autosizing, but with one extra iteration for warmup

I look at your changes to the code, but if you didn’t remove any SMark features, there is also a proper command-line interface.

See: http://forum.world.st/Convention-to-build-cmd-line-interfaces-with-Pharo-td3...

$ squeak-vm.sh Pharo-1.2.image --help SMark Benchmark Framework, version: SMark-StefanMarr.12

Usage: <vm+image> SMarkHarness [runner] [reporter]

<suiteOrBenchmark>

[iterations [processes [problemSize]]]

Arguments: runner optional, a SMarkRunner class that executes the benchmarks reporter optional, a SMarkReporter class that processes and displays the results suiteOrBenchmark required, either a SMarkSuite with benchmarks, or a benchmark denoted by Suite.benchName iterations optional, number of times the benchmarks are repeated processes optional, number of processes/threads used by the benchmarks problemSize optional, depending on benchmark for instance number of inner iterations or size of used data set

Best regards Stefan

-- Stefan Marr Johannes Kepler Universität Linz http://stefan-marr.de/research/

-- View this message in context: http://forum.world.st/A-Benchmarking-tool-for-the-trunk-tp4892463p4892865.ht... Sent from the Squeak - Dev mailing list archive at Nabble.com.

Stephan Eggermont

28 Apr 28 Apr

11:03 p.m.

On 28/04/16 00:44, Eliot Miranda wrote:

...

That would be fabulous. Hence

is it portable to Pharo?

Pharo has a class BenchmarkResult in Kernel-Chronology Pharo has no DummyStream Pharo has no asOop TimeStamp now -> DateAndTime now no , for numbers, use asString Smalltalk getVMParameters -> Smalltalk vm getParameters

Where is BenchmarkTestRunnerSuite?

Stephan

David T. Lewis

1:59 a.m.

On Wed, Apr 27, 2016 at 09:37:29PM +0200, Tim Felgentreff wrote:

...

Hi,

We have a proposal for a tool that we think might be useful to have in trunk.

This looks very nice! May I suggest that you (or we) create a SqueakMap entry for this, so that it can be easily located and loaded in Squeak 5.0 and trunk images? If it also works in Pharo, someone will probably volunteer to make a ConfigurationOfBenchmark also.

There are lots of advantages to maintaining a package like this as an external package, just as long as the package is easy to find and easy to load. It seems to me that this would be especially important for a benchmarking suite, because we would want to encourage people to use the same suite in Cuis, Pharo, and other images in the Squeak family.

Dave

...

We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.

Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg

As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference.

The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.

cheers, Tim

...

Levente Uzonyi

10:29 p.m.

On Wed, 27 Apr 2016, Tim Felgentreff wrote:

...

Hi,

We have a proposal for a tool that we think might be useful to have in trunk.

We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format.

Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg

As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference.

The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

The link seems to be broken.

Levente

...

Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those.

cheers, Tim

Tobias Pape

10:56 p.m.

On 28.04.2016, at 22:29, Levente Uzonyi leves@caesar.elte.hu wrote:

...

On Wed, 27 Apr 2016, Tim Felgentreff wrote:

...
Hi, We have a proposal for a tool that we think might be useful to have in trunk. We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format. Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference. The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

The link seems to be broken.

In how far? It works from here :)

...

Levente

...
Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those. cheers, Tim

Levente Uzonyi

29 Apr 29 Apr

12:02 a.m.

On Thu, 28 Apr 2016, Tobias Pape wrote:

...

On 28.04.2016, at 22:29, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Wed, 27 Apr 2016, Tim Felgentreff wrote:

...
Hi, We have a proposal for a tool that we think might be useful to have in trunk. We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format. Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference. The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

The link seems to be broken.

In how far? It works from here :)

HTTPS Everywhere turns it into an https URL, but that's a 404. It works via http though.

Levente

...

...
Levente

...
Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those. cheers, Tim

Tobias Pape

12:42 a.m.

On 29.04.2016, at 00:02, Levente Uzonyi leves@caesar.elte.hu wrote:

...

On Thu, 28 Apr 2016, Tobias Pape wrote:

...
On 28.04.2016, at 22:29, Levente Uzonyi leves@caesar.elte.hu wrote:

...
On Wed, 27 Apr 2016, Tim Felgentreff wrote:

...
Hi, We have a proposal for a tool that we think might be useful to have in trunk. We spent some time pulling together benchmarks from various sources (papers, the mailinglist, projects on squeaksource, ...) and combining them with an extended version of Stefan Marr's implementation of a benchmarking framework SMark. The tool and framework are modeled after SUnit, and include different execution suites and code to figure out confidence variations over multiple runs and such. Also, it draws graphs over multiple runs so you can look at things like warmup and GC behavior, and see how much time is spent doing incremental GCs and full GCs vs plain execution. As a part of this I fixed the EPS export so these graphs can be exported in a scalable format. Here is a picture of the tool: https://dl.dropboxusercontent.com/u/26242153/screenshot.jpg As I said, it's modeled after TestRunner and SUnit, benchmarks subclass from the "Benchmark" class, any method starting with "bench" is a benchmark, and you can have setUp and tearDown methods as usual. By default the benchmarks are run under an Autosize runner that re-executes each benchmark until the combined runtime reaches 600ms (to smooth out any noise). Beyond that, you can specify a number of iterations that the runner will re-do that to see multiple averaged runs. The graph shows the execution times split between running code (gray) incremental GCs (yellow) and full GCs (red). There are popups and you can scroll to zoom in and out. There is also a history of benchmark runs stored on the class side of benchmark classes for later reference. The code currently lives here: http://www.hpi.uni-potsdam.de/hirschfeld/squeaksource/BenchmarkRunner

The link seems to be broken.

In how far? It works from here :)

HTTPS Everywhere turns it into an https URL, but that's a 404. It works via http though.

Yeah sorry for that. But since we're not HTTPS-ready with Monticello, this probably has to wait a tiny bit more...

Best regards -Tobias

...

Levente

...
...
Levente

...
Considering we are every so often discussing benchmark results here, I think it might be useful to share an execution framework for those. cheers, Tim

2938

Age (days ago)

2942

Last active (days ago)

squeak-dev@lists.squeakfoundation.org

15 comments

9 participants

tags (0)

participants (9)

Chris Muller
David T. Lewis
Eliot Miranda
Levente Uzonyi
Stefan Marr
Stephan Eggermont
Tim Felgentreff
timfelgentreff
Tobias Pape