Floating point performance

David Faught dave.faught at gmail.com
Thu Dec 14 00:33:19 UTC 2006

 John M McIntosh wrote:
>Could you share your messageTally. If you are using floatarray logic
>then most of the math is done in the plugin. However
>the plugin does not take advantage of any vector processing hardware
>you might have so there is room for improvement.

The MessageTally output is below.  Maybe "almost 80% of the time was
spent in basic floating point array operations" is a little
exaggerated, but not a lot.  What vector processing hardware?  The
only thing I know of would be trying to use the video card GPU, which
could be lots of fun!

>Also if  you have say a+b*c-d  in smalltalk where these are float
>array objevcts that would three primitive  interactions, converting
>that to slang would provide some performance improvements.

I'm not sure I understand this statement.  Is there enough overhead in
the plugin API to justify eliminating a couple of calls, or is there
some data representation conversion involved that could be avoided?

I haven't read Andrew Greenberg's chapter on "Extending the Squeak
Virtual Machine" in detail yet.  I kind of skimmed over the sections
"The Shape of a Smalltalk Object" and "The Anatomy of a Named
Primitive", which I'm sure is where all the good stuff is.  Are you
saying that some performance improvement in your sample expression
could be gained by just coding it in Slang, without translating and
compiling it, or have I gone one step too far?

- 2441 tallies, 39083 msec.

100.0% {39083ms} TClothOxe>>pulse
  77.8% {30407ms} TClothOxe>>constrain
    |77.8% {30407ms} TClothOxe>>constrain:
    |  14.2% {5550ms} B3DVector3(FloatArray)>>*
    |  13.9% {5433ms} B3DVector3(FloatArray)>>-
    |  12.2% {4768ms} B3DVector3Array>>at:
    |  9.7% {3791ms} TClothOxe>>collide
    |    |9.7% {3791ms} TClothOxe>>collideSphere:
    |    |  3.6% {1407ms} B3DVector3(FloatArray)>>length
    |    |  3.0% {1172ms} B3DVector3(FloatArray)>>-
    |    |  2.9% {1133ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
    |    |    2.9% {1133ms}
    |  8.8% {3439ms} B3DVector3(FloatArray)>>+
    |  6.3% {2462ms} B3DVector3Array>>at:put:
    |  5.8% {2267ms} TClothOxe>>constrainGround
    |    |3.2% {1251ms} B3DVector3Array(B3DInplaceArray)>>do:
    |    |2.6% {1016ms} B3DVector3>>y
    |  3.8% {1485ms} OrderedCollection>>do:
    |  2.8% {1094ms} primitives
  7.0% {2736ms} B3DVector3Array(SequenceableCollection)>>replaceFrom:to:with:
    |7.0% {2736ms}
    |  2.7% {1055ms} B3DVector3Array>>at:put:
    |  2.5% {977ms} B3DVector3Array>>at:
  4.4% {1720ms} Float>>*
    |2.4% {938ms} B3DVector3(Object)>>adaptToFloat:andSend:
    |2.0% {782ms} primitives
  3.2% {1251ms} B3DVector3(FloatArray)>>-
  2.3% {899ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
    2.3% {899ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
20.1% {7856ms} B3DVector3(FloatArray)>>-
19.8% {7738ms} B3DVector3Array>>at:
15.9% {6214ms} B3DVector3(FloatArray)>>*
11.8% {4612ms} B3DVector3Array>>at:put:
10.9% {4260ms} B3DVector3(FloatArray)>>+
3.8% {1485ms} OrderedCollection>>do:
2.8% {1094ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
2.8% {1094ms} TClothOxe>>constrain:
2.6% {1016ms} B3DVector3>>y
2.0% {782ms} Float>>*

	old			+386,532 bytes
	young		-551,924 bytes
	used		-165,392 bytes
	free		+165,392 bytes

	full			0 totalling 0ms (0.0% uptime)
	incr		7133 totalling 1,326ms (3.0% uptime), avg 0.0ms
	tenures		1 (avg 7133 GCs/tenure)
	root table	0 overflows

More information about the Squeak-dev mailing list