Numbers...
Alan Grimes
alangrimes at starpower.net
Fri Apr 9 02:22:23 UTC 2004
BINARY SIZE:
OLD:
text data bss dec hex filename
831303 53088 220452 1104843 10dbcb squeak
NEW:
text data bss dec hex filename
837075 49216 220452 1106743 10e337 squeak
With compiler flag: -march=athlon-xp:
817299 53088 220452 1090839 10a517 squeak
Comment: I think I got it to inline some more (or simply different)
stuff hence the added size...
Comment2: All refferances to "OLD" (expressed or implied) below this
point refer to my workhorse VM which was compiled with the extra
compiler flag mentioned (and possibly a few others...)
Part of the way through this I was saying to myself "my code couldn't be
THAT slow!!" so I went back and made a new build with extra optomization
flags.
OLD
Benchmark #1: 7454ms
New:
Benchmark #1: 10054ms
New with flags:
Benchmark #1: 7680ms ------
------
Old:
Benchmark #2: 91651ms
New:
Benchmark #2: 106750ms
With flags:
Benchmark #2: 94106ms --------
----- Two weeks work and I loose 11%...
Benchmark #3: 20900ms
Benchmark #3: 24360ms
Benchmark #3: 21474ms --------
---- I thought I had really messed it up but it turns out the
compiler settings are much bigger factor...
Benchmark #4: 9710ms
Benchmark #4: 12268ms
Benchmark #4: 10077ms -------
----- =(((((
Benchmark 5 was invalid for all VMs.
----
Benchmark #6: 8371ms
Benchmark #6: 9825ms
Benchmark #6: 8834ms +++++
------ Well, at least I did better on SOMETHING! =P
Benchmark #7: 4140ms
Benchmark #7: 4361ms
Benchmark #7: 4301ms ------
----
Total time = 146366 milliseconds.
Total time = 171979 milliseconds. <<< Got an Athlon? Use the right
compiler flags!
Total time = 150773 milliseconds. -------
----
Some of these tests gave a response with a negative number on one or
both of the tests...
This is from my workhorse VM, my new one had the same values but they
were positive times.
Testing: [add 20000 + 20000, 10 times]
1000 repetition(s) in
-0.0001 seconds
-0.1 microseconds per repetition
VM with compiler flags: 0.0, ( below scale)
--------------------------------------------------
Testing: [divide 3 by 4, 10 times]
1000 repetition(s) in
0.0003 seconds
0.3 microseconds per repetition
New:
0.0015 seconds
1.5 microseconds per repetition
With flags:
0.0008 seconds
0.8 microseconds per repetition
-------------------------------------------------
Testing: [test 3 < 4, 10 times]
10000 repetition(s) in
0.025 seconds
2.5 microseconds per repetition
#
0.105 seconds
10.5 microseconds per repetition
#
0.02 seconds
2.0 microseconds per repetition +++++++
-----------------------------------------------------
We can see that without optimization flags, this thing is SLOW, so I'm
going to skip reporting on the version without compiler flags...
Everything after the # is post-optimization. Most of these numbers are
within the margin of error...
Most of it is just evidence that I can't program, skim down the right
hand column for points of interest...
Testing: [add 3 + 4, 10 times]
10000 repetition(s) in
0.033 seconds
3.3 microseconds per repetition
#
0.032 seconds
3.2 microseconds per repetition
-----------------------------------
Testing: [multiply 3 * 4, 10 times]
10000 repetition(s) in
0.065 seconds
6.5 microseconds per repetition
#
0.071 seconds
7.1 microseconds per repetition
__________________________
Testing: [activate and return, 32K times]
1 repetition(s) in
0.0062 seconds
#
0.0068 seconds
-----------------------------------------------
Testing: [find all calls on #printStringRadix:]
1 repetition(s) in
0.2663 seconds
#
0.276 seconds
_________________________________________
Testing: [find all implementors of #next]
1 repetition(s) in
0.0298 seconds
#
0.0344 seconds
--------------------------------------------------------------
Testing: [send #at: 20 times (to an array)]
1000 repetition(s) in
0.0003 seconds
0.3 microseconds per repetition
#
0.0009 seconds
0.9 microseconds per repetition
---------------------------------------------------------
Testing: [send #at:put: 20 times (to an array)]
1000 repetition(s) in
0.0013 seconds
1.3 microseconds per repetition
#
-0.0001 seconds
-0.1 microseconds per repetition
[ probably invalid...]
--------------------------------------------------
Testing: [convert 1 to floating point, 20 times]
100 repetition(s) in
0.0005 seconds
5.0 microseconds per repetition
#
0.0017 seconds
17.0 microseconds per repetition
---------------------------------------------------
Testing: [send #basicAt: 20 times (to a set)]
1000 repetition(s) in
0.0018 seconds
1.8 microseconds per repetition
#
0.0025 seconds
2.5 microseconds per repetition
---------------------------------------------------
Testing: [send #basicAtPut: 20 times (to a set)]
1000 repetition(s) in
0.0026 seconds
2.6 microseconds per repetition
#
0.0045 seconds
4.5 microseconds per repetition
-----------------------------------
Testing: [call bitBLT 10 times]
10 repetition(s) in
0.0099 seconds
990.0 microseconds per repetition
#
0.0125 seconds
1250.0 microseconds per repetition
---------------------------------------------------------
Testing: [send #class 20 times (to a point)]
1000 repetition(s) in
0.0005 seconds
0.5 microseconds per repetition
#
-0.0002 seconds
-0.2 microseconds per repetition
[invalid, probably]
-----------------------------------------------
Testing: [read and write class organization]
1 repetition(s) in
0.0073 seconds
#
0.0077 seconds
-----------------------------------------------------
Testing: [compile dummy method]
5 repetition(s) in
0.0206 seconds
4120.0 microseconds per repetition
#
0.0222 seconds
4440.0 microseconds per repetition
------------------------
Testing: [create 20 uninitialized points]
1000 repetition(s) in
0.0054 seconds
5.4 microseconds per repetition
#
0.0058 seconds
5.8 microseconds per repetition
-------------------------------------------
Testing: [decompile class InputSensor]
1 repetition(s) in
0.0941 seconds
#
0.0965 seconds
------------------------------------------------
Testing: [send ==, 20 times]
10000 repetition(s) in
0.03 seconds
3.0 microseconds per repetition
#
0.035 seconds
3.5 microseconds per repetition
-----------------------------------------------
Testing: [add 3.1 plus 4.1, 20 times]
100 repetition(s) in
-0.0004 seconds
-4.0 microseconds per repetition
#
0.035 seconds
3.5 microseconds per repetition
---------------------------------------
Testing: [create an inspector view]
10 repetition(s) in
-0.001 seconds
-100.0 microseconds per repetition
#
-0.001 seconds
-100.0 microseconds per repetition
----------------------------------
Testing: [text keyboard response using lookahead buffer]
3 repetition(s) in
0.0139 seconds
4633.33333333333 microseconds per repetition
#
0.0148 seconds
4933.33333333333 microseconds per repetition
---------------------------------------------
Testing: [text keyboard response for single keystroke]
3 repetition(s) in
0.0448 seconds
14933.33333333333 microseconds per repetition
#
0.0459 seconds
15300.0 microseconds per repetition
----------------------------------------------------------------
Testing: [add 80000 + 80000, 10 times]
100 repetition(s) in
0.0001 seconds
1.0 microseconds per repetition
#
0.0001 seconds
1.0 microseconds per repetition
--------------------------------------------------------------
Testing: [load an instance variable, 20 times]
10000 repetition(s) in
0.053 seconds
5.3 microseconds per repetition
#
0.04 seconds
<<<<<<<<<<<< IMPROVMENT!!! =P
4.0 microseconds per repetition
---------------------------------------------------------------
Testing: [load literal indirect (overflow refct), 20 times]
10000 repetition(s) in
0.046 seconds
4.6 microseconds per repetition
#
0.053 seconds
5.3 microseconds per repetition
-----------------------------------------------------------------------------
Testing: [load nonRefcounted literal, 20 times]
10000 repetition(s) in
0.043 seconds
4.3 microseconds per repetition
#
0.052 seconds
5.2 microseconds per repetition
-----------------------------------------------
Testing: [load 1, 40 times; send ==, 20 times]
10000 repetition(s) in
0.088 seconds
8.8 microseconds per repetition
#
0.097 seconds
9.7 microseconds per repetition
-----------------------------------------------------------
Testing: [load 1 as a temp, 20 times]
10000 repetition(s) in
0.055 seconds
5.5 microseconds per repetition
#
0.054 seconds <<< tiny
improvment. =\
5.4 microseconds per repetition
------------------------------------------
Testing: [load 0 at 0, 20 times]
10000 repetition(s) in
0.045 seconds
4.5 microseconds per repetition
#
0.055 seconds
5.5 microseconds per repetition
----------------------------------------------------
Testing: [load thisContext, 20 times]
10000 repetition(s) in
0.048 seconds
4.8 microseconds per repetition
#
0.052 seconds
5.2 microseconds per repetition
--------------------------------------------------------
Testing: [3 perform: #+ with: 4, 20 times]
1000 repetition(s) in
0.0044 seconds
4.4 microseconds per repetition
#
0.0053 seconds
5.3 microseconds per repetition
-------------------------------------------------
Testing: [create 3 at 4, 10 times]
1000 repetition(s) in
0.0017 seconds
1.7 microseconds per repetition
#
0.0007 seconds <<<<<<<< seems faster...
0.7 microseconds per repetition
------------------------------------------------------------
Testing: [execute aPoint x, 10 times]
10000 repetition(s) in
0.036 seconds
3.6 microseconds per repetition
#
0.035 seconds <<<< hmm, I must have done
something right to some point related prim...
3.5 microseconds per repetition
---------------------------------------------
Testing: [store into an instance variable, 20 times]
10000 repetition(s) in
0.021 seconds
2.1 microseconds per repetition
#
0.022 seconds
2.2 microseconds per repetition
------------------------------------------------------------------
Testing: [store into a temp, 20 times]
10000 repetition(s) in
0.02 seconds
2.0 microseconds per repetition
#
0.013 seconds <<<< YEEE
HAW!!!!
1.3 microseconds per repetition
-----------------------------------------------------------
Testing: [print a class definition]
20 repetition(s) in
0.0084 seconds
420.0 microseconds per repetition
#
0.008 seconds <<<<<< it's a
small percentage but I celebrate when I
can...
400.0 microseconds per repetition
------------------------------------------------------------
Testing: [print a class hierarchy]
10 repetition(s) in
0.0053 seconds
530.0 microseconds per repetition
#
0.0047 seconds <<<<< This one's
good too though not very useful..
470.0 microseconds per repetition
--------------------------------------------------------------
Testing: [short branch on false, 10 times]
10000 repetition(s) in
0.024 seconds
2.4 microseconds per repetition
#
0.025 seconds
2.5 microseconds per repetition
--------------------------------------------------
Testing: [send #size 20 times (to a string)]
1000 repetition(s) in
0.002 seconds
2.0 microseconds per repetition
#
0.0029 seconds
2.9 microseconds per repetition
--------------------------------------------------
Testing: [execute ReadStream next, 20 times]
1000 repetition(s) in
0.0026 seconds
2.6 microseconds per repetition
#
0.0025 seconds
2.5 microseconds per repetition
---------------------------------------------------
Testing: [execute ReadWriteStream nextPut:, 20 times]
1000 repetition(s) in
0.0032 seconds
3.2 microseconds per repetition
#
0.0052 seconds
5.2 microseconds per repetition
-----------------------------------------------------------
Testing: [send #at: 20 times (to a string)]
1000 repetition(s) in
0.0007 seconds
0.7 microseconds per repetition
#
0.0008 seconds
0.8 microseconds per repetition
--------------------------------------------------------------
Testing: [send #at:put: 20 times (to a string)]
1000 repetition(s) in
0.0 seconds
0.0 microseconds per repetition <<<< test below
scale!!!!
#
0.0004 seconds
0.4 microseconds per repetition
-------------------------------------------------------------
Testing: [replace characters in a string]
100 repetition(s) in
0.0002 seconds
2.0 microseconds per repetition
#
0.0002 seconds
2.0 microseconds per repetition
--------------------------------------------------------
Testing: [display text]
10 repetition(s) in
0.0491 seconds
4910.0 microseconds per repetition
#
0.0545 seconds
5450.0 microseconds per repetition
--------------------------------------------------------
Testing: [text replacement and redisplay]
20 repetition(s) in
0.0715 seconds
3575.0 microseconds per repetition
#
0.0724 seconds
3620.0 microseconds per repetition
-----------------------------------------------------
Testing: [format a bunch of text]
5 repetition(s) in
0.0105 seconds
2100.0 microseconds per repetition
#
0.0105 seconds
2100.0 microseconds per repetition
--------------------------------------------------------
Testing: [scan characters (primitive text display)]
100 repetition(s) in
0.0001 seconds
1.0 microseconds per repetition
#
0.0001 seconds
1.0 microseconds per repetition
------------------------------------------------
Testing: [evaluate the block: (3+4), 20 times]
1000 repetition(s) in
0.0018 seconds
1.8 microseconds per repetition
#
0.0018 seconds
1.8 microseconds per repetition
--------------------------------------------------
Testing: [simple whileLoop, 10000 times through]
10 repetition(s) in
0.007 seconds
700.0 microseconds per repetition
#
0.0078 seconds
780.0 microseconds per repetition
-------------------------------------------------
Testing: [test 3 < 4, 10 times]
10000 repetition(s) in
0.025 seconds
2.5 microseconds per repetition
###########################################################
1000s time 1000s of
itera- sec- iterations slop-
tions onds per sec stones explanation
3808 0.061 62426 9.45851962245405 add integers
#3808 0.059 64542 9.77914740626605 add integers <<<<
Improvment, in theory at least...
544 0.121 4496 18.732782369146 add floats
#544 0.134 4060 16.91542288557214 add floats
960 0.036 26667 30.23431594860167 access strings
#960 0.036 26667 30.23431594860167 access strings
320 0.076 4211 11.96172248803828 create objects
#320 0.087 3678 10.44932079414838 create objects <<
this is probably my bottleneck..
160 0.063 2540 23.73535083815458 copy objects
#160 0.069 2319 21.6714072870107 copy objects
480 0.101 4752 11.18229470005824 perform selectors
#480 0.1 4800 11.29411764705882 perform selectors
896 0.651 1376 1.901027743123626 evaluate blocks
#480 0.1 4800 11.29411764705882 perform selectors
640 0.0974821024247279 6561 11.82510239594829
geometric mean
#640 0.1021227446268415 6263 11.28774835769294
-------------------------------------------------------------------------------------------
time in smop-
seconds stones explanation
0.343 9.20408163265306 generating fractonaccis
#0.38 8.3078947368421 generating fractonaccis
0.037 30.35135135135135 generating primes*
#0.038 29.55263157894737 generating primes*
0.067 16.28358208955224 generating and parsing streams
#0.069 15.81159420289855 generating and parsing streams
0.17 18.18235294117647 generating strings
#0.171 18.07602339181286 generating strings
0.055 21.21818181818182 forming sets
#0.061 19.1311475409836 forming sets
0.176 29.19886363636364 sorting strings
#0.184 27.92934782608696 sorting strings
0.249 22.49397590361446 sorcerer's apprentice*
#0.259 21.62548262548263 sorcerer's apprentice*
0.1195204002781901 19.7028209029835 geometric mean*
0.1256798599458232 18.73720293728208 geometric mean* << with
flags version
0.1602393952414957 14.6960679512366 geometric mean* << no
flags version.
#################My own code (different image)#######################
- 138 tallies, 1996 msec.
# - 202 tallies, 2741 msec.
- 203 tallies, 2898 msec. << no flags version.
**Leaves**
86.2% {1721ms} Color>>rawLuminance
##
48.0% {1316ms} Color>>rawLuminance <<< it did well on this FP
intensive code.
8.0% {160ms} BitBlt>>pixelAt:
2.9% {58ms} Color class>>r:g:b:range:
## 4.0% {110ms} Color class>>r:g:b:range:
###################### seperate segment of my code:##################
- 369 tallies, 4601 msec.
# - 444 tallies, 5477 msec.
- 441 tallies, 5263 msec. <<< hmm, the noflags version did better
here... go figure.
**Tree**
100.0% {4601ms} ImageProcessor>>iterationsDo:
49.3% {2268ms} ImageProcessor>>centerSurroundScaled
|26.3% {1210ms} primitives
|23.0% {1058ms} ImageProcessor>>magnifyAndPrint:toX:Y:
43.9% {2020ms} ImageProcessor>>centerSurround
6.8% {313ms} ImageProcessor>>collapseArray
**Leaves**
43.9% {2020ms} ImageProcessor>>centerSurround
# 45.5% {2492ms} ImageProcessor>>centerSurround <<< =(((
26.3% {1210ms} ImageProcessor>>centerSurroundScaled
#24.1% {1320ms} ImageProcessor>>centerSurroundScaled
23.0% {1058ms} ImageProcessor>>magnifyAndPrint:toX:Y:
#22.1% {1210ms} ImageProcessor>>magnifyAndPrint:toX:Y:
6.8% {313ms} ImageProcessor>>collapseArray
#8.3% {455ms} ImageProcessor>>collapseArray
**GCs**
incr 2188 totalling 1,206ms (26.0% uptime), avg 1.0ms
# incr 2189 totalling 1,477ms (27.0% uptime), avg 1.0ms
More information about the Squeak-dev
mailing list
|