Floating point performance again

daniel poon mr.d.poon at gmail.com
Sat Dec 16 17:57:01 UTC 2006


David Faught <dave.faught <at> gmail.com> writes:

> 
> After some discussion on and off the list, I tried a few line rewrite
> of the main time consuming method for the purpose of avoiding the
> creation of unneeded intermediate result objects.  Here is the method
> before and after the rewrite:

Hi David

Very interesting post. We have just finished a commercial project where we took
a Matlab multi-body dynamics system and converted it into VSE (Visual Smalltalk
Enterprise). We are expecting to have to translate the central algorithm into
C/Fortran at some time, but we have successfully postponed that optimisation for
the time being. 

Incidently, matlab suffers from the same boxing/unboxing problem that smalltalk
does, since everything in matlab is a matrix. When I benchmarked matlab a few
years ago, it was an order of magnitudes slower than VSE for float ops (not
matrices). I may be faster now that they have a jitter. 

Scanning your code, you have the same issues that we had. As has been mentioned,
every time you have an operation that has an intermediate result, you have a
object allocation problem. 

> 		connections do: [:con|
> 			v1 _ positions at: (con node1).
> 			v2 _ positions at: (con node2).
> 			dv _ v2 clone. dv -= v1.

Here you clonning something in the loop. Try allocating the temp outside the
loop, and reuse it each iteration by filling it with zeros at the start of the
loop. We had a Fortran primitive to fill arrays with a constant value. 

> 			r2 _ con restLength * con restLength.
> 			"fast square-root aproximation:"
> 			dv *= (r2 / ((dv dot: dv) + r2) - 0.5).

If this fast square-root approximation is good, then move it into Fortran/c as a
primitive. I may be wrong, but the last time I glanced at the *= method on
Matrix, it was a primitive only when both arguments are Matrixes. Otherwise it
is implemented in Squeak

> 			v1 -= dv. v2 += dv.
> 			positions at: (con node1) put: v1.
> 			positions at: (con node2) put: v2.
> 			].
> 		].

After we did all these tweaks, we found that the only bottleneck was unpacking
the state vector before doing the 'real' calc, and re-packing the state vector
afterwards. We tried rewritting those in Fortran, but it didn't help. The
problem there was the overhead in setting up a DLL call. 

The other thing you have to look out for is whether Squeak does an marshalling
of data before or after a primitive call. VSE coppied arguments into a buffer
before a DLL call, and then coppies them back afterwards, which slows things
down when you have a 1000*1000 matrix! We solved that problem by creating a
special matrix class that allocated space on the C heap, and only passed the
heap address across. 

To sumarize: linear calcs are fine in Smalltalk. Nonlinear stuff has to go into
Fortran (eventually). at: and at:put: are a pain!

I hope that helps

Cheers

Daniel




More information about the Squeak-dev mailing list