Floating point performance again

David Faught dave.faught at gmail.com
Sat Dec 16 03:00:21 UTC 2006


After some discussion on and off the list, I tried a few line rewrite
of the main time consuming method for the purpose of avoiding the
creation of unneeded intermediate result objects.  Here is the method
before and after the rewrite:

----before----
constrain: its


	| v1 v2 dv r2 |

	self constrainGround.

	1 to: its do: [:iter|
		connections do: [:con|
			v1 _ positions at: (con node1).
			v2 _ positions at: (con node2).
			dv _ v2 - v1.
			r2 _ con restLength * con restLength.
			"fast square-root aproximation:"
			dv _ dv * (r2 / ((dv dot: dv) + r2) - 0.5).
			positions at: (con node1) put: v1 - dv.
			positions at: (con node2) put: v2 + dv.
			].
		].

	self collide.
	self doHoldCorners.

----after----
constrain2: its


	| v1 v2 dv r2 |

	self constrainGround.

	1 to: its do: [:iter|
		connections do: [:con|
			v1 _ positions at: (con node1).
			v2 _ positions at: (con node2).
			dv _ v2 clone. dv -= v1.
			r2 _ con restLength * con restLength.
			"fast square-root aproximation:"
			dv *= (r2 / ((dv dot: dv) + r2) - 0.5).
			v1 -= dv. v2 += dv.
			positions at: (con node1) put: v1.
			positions at: (con node2) put: v2.
			].
		].

	self collide.
	self doHoldCorners.
----

I was expecting good things as a result of this, but was rather
disappointed.  The before and after tally results are below.  They
show that the B3DVector3(FloatArray) *, -, and + operation times went
away (as expected) with pretty big increases in the primitives (this
is just shifted from the original operations) and
B3DVector3Array>>at:put: times, which was not expected.  What
happened, especially with the at:put: times?

I could see a shift like this in the percentages, but the actual
measured times went way up too, with the overall total time being not
very much less for the "optimized" version.  Any ideas?

----before----
 - 3213 tallies, 51993 msec.

**Tree**
100.0% {51993ms} TClothOxe>>pulse
  79.8% {41490ms} TClothOxe>>constrain
    |79.8% {41490ms} TClothOxe>>constrain:
    |  16.9% {8787ms} B3DVector3(FloatArray)>>*
    |  14.6% {7591ms} B3DVector3(FloatArray)>>-
    |  12.2% {6343ms} B3DVector3Array>>at:
    |  10.2% {5303ms} TClothOxe>>collide
    |    |10.1% {5251ms} TClothOxe>>collideSphere:
    |    |  4.0% {2080ms} B3DVector3(FloatArray)>>length
    |    |    |2.2% {1144ms} primitives
    |    |  3.3% {1716ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
    |    |    |3.3% {1716ms}
B3DVector3Array(SequenceableCollection)>>withIndexDo:
    |    |  2.7% {1404ms} B3DVector3(FloatArray)>>-
    |  7.7% {4003ms} B3DVector3(FloatArray)>>+
    |  5.8% {3016ms} TClothOxe>>constrainGround
    |    |3.2% {1664ms} B3DVector3>>y
    |    |2.6% {1352ms} B3DVector3Array(B3DInplaceArray)>>do:
    |  5.0% {2600ms} B3DVector3Array>>at:put:
    |  4.5% {2340ms} primitives
    |  2.4% {1248ms} OrderedCollection>>do:
  6.9% {3588ms} B3DVector3Array(SequenceableCollection)>>replaceFrom:to:with:
    |6.9% {3588ms}
B3DVector3Array(B3DInplaceArray)>>replaceFrom:to:with:startingAt:
    |  2.9% {1508ms} B3DVector3Array>>at:
    |  2.6% {1352ms} B3DVector3Array>>at:put:
  3.7% {1924ms} Float>>*
    |2.4% {1248ms} B3DVector3(Object)>>adaptToFloat:andSend:
  2.7% {1404ms} B3DVector3(FloatArray)>>-
  2.4% {1248ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
    2.4% {1248ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:

**Leaves**
20.4% {10607ms} B3DVector3Array>>at:
20.0% {10399ms} B3DVector3(FloatArray)>>-
18.5% {9619ms} B3DVector3(FloatArray)>>*
9.6% {4991ms} B3DVector3Array>>at:put:
9.4% {4887ms} B3DVector3(FloatArray)>>+
4.5% {2340ms} TClothOxe>>constrain:
3.2% {1664ms} B3DVector3>>y
3.1% {1612ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
2.4% {1248ms} OrderedCollection>>do:
2.2% {1144ms} B3DVector3(FloatArray)>>length

**Memory**
	old			+0 bytes
	young		+30,264 bytes
	used		+30,264 bytes
	free		-30,264 bytes

**GCs**
	full			0 totalling 0ms (0.0% uptime)
	incr		7139 totalling 13,565ms (26.0% uptime), avg 2.0ms
	tenures		0
	root table	0 overflows

----after----
 - 2799 tallies, 45160 msec.

**Tree**
100.0% {45160ms} TClothOxe>>pulse
  75.8% {34231ms} TClothOxe>>constrain
    |75.8% {34231ms} TClothOxe>>constrain2:
    |  24.1% {10884ms} B3DVector3Array>>at:put:
    |  15.6% {7045ms} B3DVector3Array>>at:
    |  14.2% {6413ms} primitives
    |  11.6% {5239ms} TClothOxe>>collide
    |    |11.6% {5239ms} TClothOxe>>collideSphere:
    |    |  4.3% {1942ms} B3DVector3(FloatArray)>>length
    |    |    |2.2% {994ms} primitives
    |    |    |2.1% {948ms} B3DVector3(FloatArray)>>squaredLength
    |    |  3.8% {1716ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
    |    |    |3.8% {1716ms}
B3DVector3Array(SequenceableCollection)>>withIndexDo:
    |    |    |  2.1% {948ms} primitives
    |    |  3.3% {1490ms} B3DVector3(FloatArray)>>-
    |  6.0% {2710ms} TClothOxe>>constrainGround
    |    |3.6% {1626ms} B3DVector3Array(B3DInplaceArray)>>do:
    |    |2.4% {1084ms} B3DVector3>>y
    |  3.6% {1626ms} OrderedCollection>>do:
  7.0% {3161ms} B3DVector3Array(SequenceableCollection)>>replaceFrom:to:with:
    |7.0% {3161ms}
B3DVector3Array(B3DInplaceArray)>>replaceFrom:to:with:startingAt:
    |  2.5% {1129ms} B3DVector3Array>>at:
    |  2.5% {1129ms} B3DVector3Array>>at:put:
  5.0% {2258ms} Float>>*
    |2.9% {1310ms} B3DVector3(Object)>>adaptToFloat:andSend:
    |  |2.5% {1129ms} B3DVector3(FloatArray)>>adaptToNumber:andSend:
    |  |  2.1% {948ms} B3DVector3(FloatArray)>>*
    |2.1% {948ms} primitives
  3.1% {1400ms} B3DVector3Array(SequenceableCollection)>>doWithIndex:
    |3.1% {1400ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
  2.8% {1264ms} B3DVector3(FloatArray)>>+
  2.7% {1219ms} B3DVector3(FloatArray)>>-
  2.1% {948ms} B3DVector3Array>>at:

**Leaves**
29.3% {13232ms} B3DVector3Array>>at:put:
25.1% {11335ms} B3DVector3Array>>at:
14.2% {6413ms} TClothOxe>>constrain2:
6.0% {2710ms} B3DVector3(FloatArray)>>-
3.6% {1626ms} OrderedCollection>>do:
3.5% {1581ms} B3DVector3Array(SequenceableCollection)>>withIndexDo:
3.0% {1355ms} B3DVector3(FloatArray)>>+
2.4% {1084ms} B3DVector3>>y
2.2% {994ms} B3DVector3(FloatArray)>>length
2.2% {994ms} B3DVector3(FloatArray)>>*
2.1% {948ms} B3DVector3(FloatArray)>>squaredLength
2.1% {948ms} Float>>*

**Memory**
	old			+0 bytes
	young		-92,828 bytes
	used		-92,828 bytes
	free		+92,828 bytes

**GCs**
	full			0 totalling 0ms (0.0% uptime)
	incr		5804 totalling 11,186ms (25.0% uptime), avg 2.0ms
	tenures		0
	root table	0 overflows

----



More information about the Squeak-dev mailing list