Numbers...

Alan Grimes alangrimes at starpower.net
Fri Apr 9 02:22:23 UTC 2004


BINARY SIZE:

OLD:
   text    data     bss     dec     hex filename
 831303   53088  220452 1104843  10dbcb squeak

NEW:
   text    data     bss     dec     hex filename
 837075   49216  220452 1106743  10e337 squeak

With compiler flag: -march=athlon-xp:
 817299   53088  220452 1090839  10a517 squeak



Comment: I think I got it to inline some more (or simply different) 
stuff hence the added size...
Comment2: All refferances to "OLD" (expressed or implied) below this 
point refer to my workhorse VM which was compiled with  the extra 
compiler flag mentioned (and possibly a few others...)

Part of the way through this I was saying to myself "my code couldn't be 
THAT slow!!" so I went back and made a new build with extra optomization 
flags.

OLD
Benchmark #1: 7454ms

New:
Benchmark #1: 10054ms

New with flags:
Benchmark #1: 7680ms     ------

------

Old:
Benchmark #2: 91651ms

New:
Benchmark #2: 106750ms
With flags:
Benchmark #2: 94106ms   --------

-----  Two weeks work and I loose 11%...

Benchmark #3: 20900ms
Benchmark #3: 24360ms
Benchmark #3: 21474ms  --------

----     I thought I had really messed it up but it turns out the 
compiler settings are much bigger factor...

Benchmark #4: 9710ms
Benchmark #4: 12268ms
Benchmark #4: 10077ms    -------

-----   =(((((

Benchmark 5 was invalid for all VMs.

----

Benchmark #6: 8371ms
Benchmark #6: 9825ms
Benchmark #6: 8834ms    +++++

------ Well, at least I did better on SOMETHING! =P

Benchmark #7: 4140ms
Benchmark #7: 4361ms
Benchmark #7: 4301ms    ------

----

Total time = 146366 milliseconds.
Total time = 171979 milliseconds.      <<< Got an Athlon? Use the right 
compiler flags!
Total time = 150773 milliseconds.      -------

----

Some of these tests gave a response with a negative number on one or 
both of the tests...
This is from my workhorse VM, my new one had the same values but they 
were positive times.

Testing:  [add 20000 + 20000, 10 times]
    1000 repetition(s) in
    -0.0001 seconds
    -0.1 microseconds per repetition

VM with compiler flags: 0.0, ( below scale) 
--------------------------------------------------
Testing:  [divide 3 by 4, 10 times]
    1000 repetition(s) in
    0.0003 seconds
    0.3 microseconds per repetition

New:
    0.0015 seconds
    1.5 microseconds per repetition
With flags:
    0.0008 seconds
    0.8 microseconds per repetition


-------------------------------------------------
Testing:  [test 3 < 4, 10 times]
    10000 repetition(s) in
    0.025 seconds
    2.5 microseconds per repetition
#
    0.105 seconds
    10.5 microseconds per repetition
#
    0.02 seconds
    2.0 microseconds per repetition   +++++++

-----------------------------------------------------
We can see that without optimization flags, this thing is SLOW, so I'm 
going to skip reporting on the version without compiler flags...  
Everything after the # is post-optimization. Most of these numbers are 
within the margin of error...

Most of it is just evidence that I can't program, skim down the right 
hand column for points of interest...


Testing:  [add 3 + 4, 10 times]
    10000 repetition(s) in
    0.033 seconds
    3.3 microseconds per repetition
#
    0.032 seconds
    3.2 microseconds per repetition
-----------------------------------
Testing:  [multiply 3 * 4, 10 times]
    10000 repetition(s) in
    0.065 seconds
    6.5 microseconds per repetition
#
    0.071 seconds
    7.1 microseconds per repetition
__________________________

Testing:  [activate and return, 32K times]
    1 repetition(s) in
    0.0062 seconds
#
    0.0068 seconds
-----------------------------------------------

Testing:  [find all calls on #printStringRadix:]
    1 repetition(s) in
    0.2663 seconds
#
    0.276 seconds
_________________________________________
Testing:  [find all implementors of #next]
    1 repetition(s) in
    0.0298 seconds
#
    0.0344 seconds
--------------------------------------------------------------

Testing:  [send #at: 20 times (to an array)]
    1000 repetition(s) in
    0.0003 seconds
    0.3 microseconds per repetition
#
    0.0009 seconds
    0.9 microseconds per repetition
---------------------------------------------------------

Testing:  [send #at:put: 20 times (to an array)]
    1000 repetition(s) in
    0.0013 seconds
    1.3 microseconds per repetition
#
    -0.0001 seconds
    -0.1 microseconds per repetition

[ probably invalid...]
--------------------------------------------------

Testing:  [convert 1 to floating point, 20 times]
    100 repetition(s) in
    0.0005 seconds
    5.0 microseconds per repetition
#
    0.0017 seconds
    17.0 microseconds per repetition
---------------------------------------------------
Testing:  [send #basicAt: 20 times (to a set)]
    1000 repetition(s) in
    0.0018 seconds
    1.8 microseconds per repetition
#
    0.0025 seconds
    2.5 microseconds per repetition
---------------------------------------------------
Testing:  [send #basicAtPut: 20 times (to a set)]
    1000 repetition(s) in
    0.0026 seconds
    2.6 microseconds per repetition
# 
    0.0045 seconds
    4.5 microseconds per repetition
-----------------------------------

Testing:  [call bitBLT 10 times]
    10 repetition(s) in
    0.0099 seconds
    990.0 microseconds per repetition
#
    0.0125 seconds
    1250.0 microseconds per repetition
---------------------------------------------------------
Testing:  [send #class 20 times (to a point)]
    1000 repetition(s) in
    0.0005 seconds
    0.5 microseconds per repetition
#
    -0.0002 seconds
    -0.2 microseconds per repetition
[invalid, probably]
-----------------------------------------------

Testing:  [read and write class organization]
    1 repetition(s) in
    0.0073 seconds
#
    0.0077 seconds

-----------------------------------------------------
Testing:  [compile dummy method]
    5 repetition(s) in
    0.0206 seconds
    4120.0 microseconds per repetition
#
    0.0222 seconds
    4440.0 microseconds per repetition

------------------------
Testing:  [create 20 uninitialized points]
    1000 repetition(s) in
    0.0054 seconds
    5.4 microseconds per repetition
#
    0.0058 seconds
    5.8 microseconds per repetition

-------------------------------------------

Testing:  [decompile class InputSensor]
    1 repetition(s) in
    0.0941 seconds
#
    0.0965 seconds
------------------------------------------------
Testing:  [send ==, 20 times]
    10000 repetition(s) in
    0.03 seconds
    3.0 microseconds per repetition
#
    0.035 seconds
    3.5 microseconds per repetition
-----------------------------------------------
Testing:  [add 3.1 plus 4.1, 20 times]
    100 repetition(s) in
    -0.0004 seconds
    -4.0 microseconds per repetition
#
    0.035 seconds
    3.5 microseconds per repetition
---------------------------------------
Testing:  [create an inspector view]
    10 repetition(s) in
    -0.001 seconds
    -100.0 microseconds per repetition
#
    -0.001 seconds
    -100.0 microseconds per repetition
----------------------------------

Testing:  [text keyboard response using lookahead buffer]
    3 repetition(s) in
    0.0139 seconds
    4633.33333333333 microseconds per repetition
#
    0.0148 seconds
    4933.33333333333 microseconds per repetition

---------------------------------------------


Testing:  [text keyboard response for single keystroke]
    3 repetition(s) in
    0.0448 seconds
    14933.33333333333 microseconds per repetition
#
    0.0459 seconds
    15300.0 microseconds per repetition

----------------------------------------------------------------
Testing:  [add 80000 + 80000, 10 times]
    100 repetition(s) in
    0.0001 seconds
    1.0 microseconds per repetition
#
    0.0001 seconds
    1.0 microseconds per repetition

--------------------------------------------------------------
Testing:  [load an instance variable, 20 times]
    10000 repetition(s) in
    0.053 seconds
    5.3 microseconds per repetition
#
    0.04 seconds                                              
<<<<<<<<<<<< IMPROVMENT!!! =P
    4.0 microseconds per repetition                           
---------------------------------------------------------------
Testing:  [load literal indirect (overflow refct), 20 times]
    10000 repetition(s) in
    0.046 seconds
    4.6 microseconds per repetition
#
    0.053 seconds
    5.3 microseconds per repetition
-----------------------------------------------------------------------------

Testing:  [load nonRefcounted literal, 20 times]
    10000 repetition(s) in
    0.043 seconds
    4.3 microseconds per repetition
#
    0.052 seconds
    5.2 microseconds per repetition
-----------------------------------------------
Testing:  [load 1, 40 times; send ==, 20 times]
    10000 repetition(s) in
    0.088 seconds
    8.8 microseconds per repetition
#
    0.097 seconds
    9.7 microseconds per repetition

-----------------------------------------------------------
Testing:  [load 1 as a temp, 20 times]
    10000 repetition(s) in
    0.055 seconds
    5.5 microseconds per repetition
#
    0.054 seconds                                            <<< tiny 
improvment. =\
    5.4 microseconds per repetition        

------------------------------------------

Testing:  [load 0 at 0, 20 times]
    10000 repetition(s) in
    0.045 seconds
    4.5 microseconds per repetition
#

    0.055 seconds
    5.5 microseconds per repetition

----------------------------------------------------

Testing:  [load thisContext, 20 times]
    10000 repetition(s) in
    0.048 seconds
    4.8 microseconds per repetition
#
    0.052 seconds
    5.2 microseconds per repetition
--------------------------------------------------------
Testing:  [3 perform: #+ with: 4, 20 times]
    1000 repetition(s) in
    0.0044 seconds
    4.4 microseconds per repetition
#
    0.0053 seconds
    5.3 microseconds per repetition
-------------------------------------------------
Testing:  [create 3 at 4, 10 times]
    1000 repetition(s) in
    0.0017 seconds
    1.7 microseconds per repetition
#
    0.0007 seconds                                  <<<<<<<< seems faster...
    0.7 microseconds per repetition
------------------------------------------------------------
Testing:  [execute aPoint x, 10 times]
    10000 repetition(s) in
    0.036 seconds
    3.6 microseconds per repetition
#
    0.035 seconds                     <<<< hmm, I must have done 
something right to some point related prim...
    3.5 microseconds per repetition

---------------------------------------------

Testing:  [store into an instance variable, 20 times]
    10000 repetition(s) in
    0.021 seconds
    2.1 microseconds per repetition
#
    0.022 seconds
    2.2 microseconds per repetition
------------------------------------------------------------------

Testing:  [store into a temp, 20 times]
    10000 repetition(s) in
    0.02 seconds
    2.0 microseconds per repetition
# 
    0.013 seconds                                            <<<< YEEE 
HAW!!!!
    1.3 microseconds per repetition
-----------------------------------------------------------
Testing:  [print a class definition]
    20 repetition(s) in
    0.0084 seconds
    420.0 microseconds per repetition
#
    0.008 seconds                                     <<<<<<  it's a 
small percentage but I celebrate when I                                 
                can...
    400.0 microseconds per repetition  
------------------------------------------------------------
Testing:  [print a class hierarchy]
    10 repetition(s) in
    0.0053 seconds
    530.0 microseconds per repetition
#
    0.0047 seconds                                    <<<<< This one's 
good too though not very useful..
    470.0 microseconds per repetition

--------------------------------------------------------------

Testing:  [short branch on false, 10 times]
    10000 repetition(s) in
    0.024 seconds
    2.4 microseconds per repetition
#
    0.025 seconds
    2.5 microseconds per repetition
--------------------------------------------------

Testing:  [send #size 20 times (to a string)]
    1000 repetition(s) in
    0.002 seconds
    2.0 microseconds per repetition
#
    0.0029 seconds
    2.9 microseconds per repetition
--------------------------------------------------
Testing:  [execute ReadStream next, 20 times]
    1000 repetition(s) in
    0.0026 seconds
    2.6 microseconds per repetition
#
    0.0025 seconds
    2.5 microseconds per repetition
---------------------------------------------------
Testing:  [execute ReadWriteStream nextPut:, 20 times]
    1000 repetition(s) in
    0.0032 seconds
    3.2 microseconds per repetition
#
    0.0052 seconds
    5.2 microseconds per repetition
-----------------------------------------------------------
Testing:  [send #at: 20 times (to a string)]
    1000 repetition(s) in
    0.0007 seconds
    0.7 microseconds per repetition
#
    0.0008 seconds
    0.8 microseconds per repetition
--------------------------------------------------------------
Testing:  [send #at:put: 20 times (to a string)]
    1000 repetition(s) in
    0.0 seconds
    0.0 microseconds per repetition                      <<<< test below 
scale!!!!
#
    0.0004 seconds
    0.4 microseconds per repetition
-------------------------------------------------------------
Testing:  [replace characters in a string]
    100 repetition(s) in
    0.0002 seconds
    2.0 microseconds per repetition
#
    0.0002 seconds
    2.0 microseconds per repetition
--------------------------------------------------------
Testing:  [display text]
    10 repetition(s) in
    0.0491 seconds
    4910.0 microseconds per repetition
#
    0.0545 seconds
    5450.0 microseconds per repetition
--------------------------------------------------------
Testing:  [text replacement and redisplay]
    20 repetition(s) in
    0.0715 seconds
    3575.0 microseconds per repetition
#
    0.0724 seconds
    3620.0 microseconds per repetition
-----------------------------------------------------
Testing:  [format a bunch of text]
    5 repetition(s) in
    0.0105 seconds
    2100.0 microseconds per repetition
#
    0.0105 seconds
    2100.0 microseconds per repetition
--------------------------------------------------------
Testing:  [scan characters (primitive text display)]
    100 repetition(s) in
    0.0001 seconds
    1.0 microseconds per repetition
#
    0.0001 seconds
    1.0 microseconds per repetition
------------------------------------------------
Testing:  [evaluate the block: (3+4), 20 times]
    1000 repetition(s) in
    0.0018 seconds
    1.8 microseconds per repetition
#
    0.0018 seconds
    1.8 microseconds per repetition

--------------------------------------------------
Testing:  [simple whileLoop, 10000 times through]
    10 repetition(s) in
    0.007 seconds
    700.0 microseconds per repetition
#
    0.0078 seconds
    780.0 microseconds per repetition
-------------------------------------------------
Testing:  [test 3 < 4, 10 times]
    10000 repetition(s) in
    0.025 seconds
    2.5 microseconds per repetition

###########################################################

1000s    time    1000s of
itera-   sec-    iterations   slop-
tions    onds    per sec      stones   explanation

  3808     0.061     62426     9.45851962245405     add integers
#3808     0.059     64542     9.77914740626605     add integers   <<<< 
Improvment, in theory at least...

  544     0.121     4496     18.732782369146     add floats
#544     0.134     4060     16.91542288557214     add floats
  960     0.036     26667     30.23431594860167     access strings
#960     0.036     26667     30.23431594860167     access strings
  320     0.076     4211     11.96172248803828     create objects
#320     0.087     3678     10.44932079414838     create objects  << 
this is probably my bottleneck..
  160     0.063     2540     23.73535083815458     copy objects
#160     0.069     2319     21.6714072870107     copy objects
  480     0.101     4752     11.18229470005824     perform selectors
#480     0.1         4800     11.29411764705882     perform selectors
  896     0.651     1376     1.901027743123626     evaluate blocks
#480     0.1     4800     11.29411764705882         perform selectors

  640     0.0974821024247279     6561     11.82510239594829     
geometric mean
#640     0.1021227446268415     6263     11.28774835769294    

-------------------------------------------------------------------------------------------
time in    smop-
seconds    stones    explanation
  0.343     9.20408163265306     generating fractonaccis
#0.38     8.3078947368421     generating fractonaccis
  0.037     30.35135135135135     generating primes*
#0.038     29.55263157894737     generating primes*
  0.067     16.28358208955224     generating and parsing streams
#0.069     15.81159420289855     generating and parsing streams
  0.17     18.18235294117647     generating strings
#0.171     18.07602339181286     generating strings
  0.055     21.21818181818182     forming sets
#0.061     19.1311475409836     forming sets
  0.176     29.19886363636364     sorting strings
#0.184     27.92934782608696     sorting strings
  0.249     22.49397590361446     sorcerer's apprentice*
#0.259     21.62548262548263     sorcerer's apprentice*

0.1195204002781901     19.7028209029835     geometric mean*
0.1256798599458232     18.73720293728208     geometric mean*  << with 
flags version
0.1602393952414957     14.6960679512366     geometric mean*    << no 
flags version.

#################My own code (different image)#######################
 - 138 tallies, 1996 msec.
# - 202 tallies, 2741 msec.
 - 203 tallies, 2898 msec.   << no flags version.

**Leaves**
    86.2% {1721ms} Color>>rawLuminance
##
     48.0% {1316ms} Color>>rawLuminance   <<< it did well on this FP 
intensive code.
8.0% {160ms} BitBlt>>pixelAt:
2.9% {58ms} Color class>>r:g:b:range:
## 4.0% {110ms} Color class>>r:g:b:range:  


###################### seperate segment of my code:##################
 - 369 tallies, 4601 msec.
# - 444 tallies, 5477 msec.
   - 441 tallies, 5263 msec.  <<< hmm, the noflags version did better 
here... go figure.

**Tree**
100.0% {4601ms} ImageProcessor>>iterationsDo:
  49.3% {2268ms} ImageProcessor>>centerSurroundScaled
    |26.3% {1210ms} primitives
    |23.0% {1058ms} ImageProcessor>>magnifyAndPrint:toX:Y:
  43.9% {2020ms} ImageProcessor>>centerSurround
  6.8% {313ms} ImageProcessor>>collapseArray

**Leaves**
   43.9% {2020ms} ImageProcessor>>centerSurround
# 45.5% {2492ms} ImageProcessor>>centerSurround    <<< =(((
  26.3% {1210ms} ImageProcessor>>centerSurroundScaled
#24.1% {1320ms} ImageProcessor>>centerSurroundScaled
  23.0% {1058ms} ImageProcessor>>magnifyAndPrint:toX:Y:
#22.1% {1210ms} ImageProcessor>>magnifyAndPrint:toX:Y:
  6.8% {313ms} ImageProcessor>>collapseArray
#8.3% {455ms} ImageProcessor>>collapseArray


**GCs**
    incr        2188 totalling 1,206ms (26.0% uptime), avg 1.0ms
#  incr        2189 totalling 1,477ms (27.0% uptime), avg 1.0ms



More information about the Squeak-dev mailing list