Floating point performance
David Faught
dave.faught at gmail.com
Thu Dec 14 00:33:19 UTC 2006
John M McIntosh wrote:
>Could you share your messageTally. If you are using floatarray logic
>then most of the math is done in the plugin. However
>the plugin does not take advantage of any vector processing hardware
>you might have so there is room for improvement.
The MessageTally output is below. Maybe "almost 80% of the time was
spent in basic floating point array operations" is a little
exaggerated, but not a lot. What vector processing hardware? The
only thing I know of would be trying to use the video card GPU, which
could be lots of fun!
>Also if you have say a+b*c-d in smalltalk where these are float
>array objevcts that would three primitive interactions, converting
>that to slang would provide some performance improvements.
I'm not sure I understand this statement. Is there enough overhead in
the plugin API to justify eliminating a couple of calls, or is there
some data representation conversion involved that could be avoided?
I haven't read Andrew Greenberg's chapter on "Extending the Squeak
Virtual Machine" in detail yet. I kind of skimmed over the sections
"The Shape of a Smalltalk Object" and "The Anatomy of a Named
Primitive", which I'm sure is where all the good stuff is. Are you
saying that some performance improvement in your sample expression
could be gained by just coding it in Slang, without translating and
compiling it, or have I gone one step too far?
- 2441 tallies, 39083 msec.
**Tree**
100.0% {39083ms} TClothOxe>>pulse
77.8% {30407ms} TClothOxe>>constrain
|77.8% {30407ms} TClothOxe>>constrain:
| 14.2% {5550ms} B3DVector3(FloatArray)>>*
| 13.9% {5433ms} B3DVector3(FloatArray)>>-
| 12.2% {4768ms} B3DVector3Array>>at:
| 9.7% {3791ms} TClothOxe>>collide
| |9.7% {3791ms} TClothOxe>>collideSphere:
| | 3.6% {1407ms} B3DVector3(FloatArray)>>length
| | 3.0% {1172ms} B3DVector3(FloatArray)>>-
| | 2.9% {1133ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
| | 2.9% {1133ms}
B3DVector3Array(SequenceableCollection)>>withIndexDo:
| 8.8% {3439ms} B3DVector3(FloatArray)>>+
| 6.3% {2462ms} B3DVector3Array>>at:put:
| 5.8% {2267ms} TClothOxe>>constrainGround
| |3.2% {1251ms} B3DVector3Array(B3DInplaceArray)>>do:
| |2.6% {1016ms} B3DVector3>>y
| 3.8% {1485ms} OrderedCollection>>do:
| 2.8% {1094ms} primitives
7.0% {2736ms} B3DVector3Array(SequenceableCollection)>>replaceFrom:to:with:
|7.0% {2736ms}
B3DVector3Array(B3DInplaceArray)>>replaceFrom:to:with:startingAt:
| 2.7% {1055ms} B3DVector3Array>>at:put:
| 2.5% {977ms} B3DVector3Array>>at:
4.4% {1720ms} Float>>*
|2.4% {938ms} B3DVector3(Object)>>adaptToFloat:andSend:
|2.0% {782ms} primitives
3.2% {1251ms} B3DVector3(FloatArray)>>-
2.3% {899ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
2.3% {899ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
**Leaves**
20.1% {7856ms} B3DVector3(FloatArray)>>-
19.8% {7738ms} B3DVector3Array>>at:
15.9% {6214ms} B3DVector3(FloatArray)>>*
11.8% {4612ms} B3DVector3Array>>at:put:
10.9% {4260ms} B3DVector3(FloatArray)>>+
3.8% {1485ms} OrderedCollection>>do:
2.8% {1094ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
2.8% {1094ms} TClothOxe>>constrain:
2.6% {1016ms} B3DVector3>>y
2.0% {782ms} Float>>*
**Memory**
old +386,532 bytes
young -551,924 bytes
used -165,392 bytes
free +165,392 bytes
**GCs**
full 0 totalling 0ms (0.0% uptime)
incr 7133 totalling 1,326ms (3.0% uptime), avg 0.0ms
tenures 1 (avg 7133 GCs/tenure)
root table 0 overflows
More information about the Squeak-dev
mailing list
|