[squeak-dev] Matrix Performance - was: Interesting survey about smalltalk

Levente Uzonyi leves at elte.hu
Thu Jun 24 23:34:35 UTC 2010


On Thu, 24 Jun 2010, Jimmie Houchin wrote:

<snip>

>
> I just recently had an opportunity to study your code to learn. I haven't 
> tried it yet, but while exploring and studying I realized if I understand 
> correctly a problem. Many/(most ?) of the uses of Matrix, (Numpy), etc. the 
> order of the data is a part of the data. It isn't simply a set of numbers, 
> but an ordered collection of numbers or data. Whether it is weather 
> statistics, etc., or in my case financials which are associated to a precise 
> point in time in the market.
>
> To my understanding using a tree based implementation would lose order of the 
> data. I would imagine if you used keys which could be ordered and then order 
> applied, you would lose the performance gains.

The binary tree is only useful for the fast calculation of min max and 
median. If the ordering of the data matters for a calculation, then you 
need some other trick.

>
> In my simple example order had little meaning. But a common calculation is a 
> weighted average which does associate a weighted value according to position 
> in the array.

If the weights are exponential, you can use the same trick as with sum. 
If the weights are not exponential, you can use convolution with FFI for 
O((n+w)*log(n+w)) performance which is O(n*log(n)) if w << n. Here is an 
example:

| data windowWithWeights fftSize circularData circularWeights fft cdtr cdti cwtr cwti |
data := #(1 2 3 4 5) asFloatArray. "Our time series."
windowWithWeights := #(4 2 1) asFloatArray. "Most recent data with weight 4, least recent data with weight 1"
"Prepare for the FFT"
fftSize := (data size + windowWithWeights size) asLargerPowerOfTwo.
circularData := data copyGrownBy: fftSize - data size.
circularWeights := windowWithWeights copyGrownBy: fftSize - windowWithWeights size.
fft := FFT new: fftSize.
"Transform the data and store the results"
fft realData: circularData.
fft transformForward: true.
cdtr := fft realData.
cdti := fft imagData.
"Transform the weights and store the results"
fft realData: circularWeights.
fft transformForward: true.
cwtr := fft realData.
cwti := fft imagData.
"Do the convolution which is simple complex multiplication, then the IFFT."
fft
 	realData: cdtr * cwtr - (cdti * cwti)
 	imagData: cdtr * cwti + (cdti * cwtr).
fft transformForward: false.
"Extract data"
fft realData copyFrom: windowWithWeights size to: data size.
"a FloatArray(17.0 24.0 31.0)
 	17 = 4 * 3 + 2 * 2 + 1 * 1,
 	24 = 4 * 4 + 2 * 3 + 1 * 2,
 	31 = 4 * 5 + 2 * 4 + 1 * 3"

>
> While playing this is the implementation that performed the best for me. 
> Going from 6200ms to 3600ms.
>
> m := Matrix rows: 6333 columns: 162.
> omd := OrderedCollection new.
> ttr := [| column start windowSize |
>    column := m atColumn: 5.
>    start := 1.
>    windowSize := 500.
>    1 to: ((column size) - windowSize) do: [:i || row rowSum rowMax rowMin 
> rowAverage  rowMedian |
>    row := column copyFrom: i to: i+windowSize.
>    rowSum := row sum.
>    rowMax := row max.
>    rowMin := row min.
>    rowMedian := row median.
>    rowAverage := row average.
>    omd6 add: {rowSum . rowMax . rowMin . rowAverage}]] timeToRun.
> ttr 3600
>
> If I remove the median calculation  I get 594ms. You are quite correct, the 
> median calculation is quite expensive. However it is quite light to the 
> things I actually do. :)

I found a simple way to improve the performance without using a 
tree or loading new packages: SortedCollection :). The least useful data 
structure in Squeak can shine here (though the tree-based implementation 
is >10 times faster in this case, but it's still pretty good):

rng := Random new.
m := Matrix rows: 6333 columns: 162 tabulate: [ :r :c | rng next ].
omd := OrderedCollection new.
ttr := [
 	| column windowSize window sum |
 	column := m atColumn: 5.
 	windowSize := 500.
  	window := (column first: windowSize) asSortedCollection.
 	sum := window sum.
 	omd add: { sum. window first. window last. window median. sum / 
windowSize }.
 	1 to: column size - windowSize do: [ :i |
 		| outgoing incoming |
 		outgoing := column at: i.
 		incoming := column at: i + windowSize.
 		sum := sum - outgoing + incoming.
 		window
 			remove: outgoing;
 			add: incoming.
 		omd add: { sum. window first. window last. window median. 
sum / windowSize } ] ] timeToRun.
ttr

My result for this code is 335ms (and it calculates the median too). It 
could be improved with a special method but that's out of the scope of 
this mail. :)

Calculating the weighted averages with the same parameters (6333 data, 500 
window size) with the FFT convolution described above took 350ms.

I hope you'll find these methods useful.


Levente

>
> 594ms is quite an improvement over the original 6200ms. The original without 
> median is now 2704.
> Numpy is still faster over the same data set and doing the same thing it is 
> at 302ms. But Numpy is optimized C/Fortran with a Python face. It is a nice 
> API but still dependent on someone doing the hard work on the  back end. The 
> nice thing with the Squeak code is that it is reasonably performing and all 
> Smalltalk. :)
>
> Thanks for the education.
>
> Jimmie
>
>



More information about the Squeak-dev mailing list