## [squeak-dev] Matrix Performance - was: Interesting survey about smalltalk

Levente Uzonyi leves at elte.hu
Thu Jun 24 23:34:35 UTC 2010

On Thu, 24 Jun 2010, Jimmie Houchin wrote:

<snip>

>
> I just recently had an opportunity to study your code to learn. I haven't
> tried it yet, but while exploring and studying I realized if I understand
> correctly a problem. Many/(most ?) of the uses of Matrix, (Numpy), etc. the
> order of the data is a part of the data. It isn't simply a set of numbers,
> but an ordered collection of numbers or data. Whether it is weather
> statistics, etc., or in my case financials which are associated to a precise
> point in time in the market.
>
> To my understanding using a tree based implementation would lose order of the
> data. I would imagine if you used keys which could be ordered and then order
> applied, you would lose the performance gains.

The binary tree is only useful for the fast calculation of min max and
median. If the ordering of the data matters for a calculation, then you
need some other trick.

>
> In my simple example order had little meaning. But a common calculation is a
> weighted average which does associate a weighted value according to position
> in the array.

If the weights are exponential, you can use the same trick as with sum.
If the weights are not exponential, you can use convolution with FFI for
O((n+w)*log(n+w)) performance which is O(n*log(n)) if w << n. Here is an
example:

| data windowWithWeights fftSize circularData circularWeights fft cdtr cdti cwtr cwti |
data := #(1 2 3 4 5) asFloatArray. "Our time series."
windowWithWeights := #(4 2 1) asFloatArray. "Most recent data with weight 4, least recent data with weight 1"
"Prepare for the FFT"
fftSize := (data size + windowWithWeights size) asLargerPowerOfTwo.
circularData := data copyGrownBy: fftSize - data size.
circularWeights := windowWithWeights copyGrownBy: fftSize - windowWithWeights size.
fft := FFT new: fftSize.
"Transform the data and store the results"
fft realData: circularData.
fft transformForward: true.
cdtr := fft realData.
cdti := fft imagData.
"Transform the weights and store the results"
fft realData: circularWeights.
fft transformForward: true.
cwtr := fft realData.
cwti := fft imagData.
"Do the convolution which is simple complex multiplication, then the IFFT."
fft
realData: cdtr * cwtr - (cdti * cwti)
imagData: cdtr * cwti + (cdti * cwtr).
fft transformForward: false.
"Extract data"
fft realData copyFrom: windowWithWeights size to: data size.
"a FloatArray(17.0 24.0 31.0)
17 = 4 * 3 + 2 * 2 + 1 * 1,
24 = 4 * 4 + 2 * 3 + 1 * 2,
31 = 4 * 5 + 2 * 4 + 1 * 3"

>
> While playing this is the implementation that performed the best for me.
> Going from 6200ms to 3600ms.
>
> m := Matrix rows: 6333 columns: 162.
> omd := OrderedCollection new.
> ttr := [| column start windowSize |
>    column := m atColumn: 5.
>    start := 1.
>    windowSize := 500.
>    1 to: ((column size) - windowSize) do: [:i || row rowSum rowMax rowMin
> rowAverage  rowMedian |
>    row := column copyFrom: i to: i+windowSize.
>    rowSum := row sum.
>    rowMax := row max.
>    rowMin := row min.
>    rowMedian := row median.
>    rowAverage := row average.
>    omd6 add: {rowSum . rowMax . rowMin . rowAverage}]] timeToRun.
> ttr 3600
>
> If I remove the median calculation  I get 594ms. You are quite correct, the
> median calculation is quite expensive. However it is quite light to the
> things I actually do. :)

I found a simple way to improve the performance without using a
structure in Squeak can shine here (though the tree-based implementation
is >10 times faster in this case, but it's still pretty good):

rng := Random new.
m := Matrix rows: 6333 columns: 162 tabulate: [ :r :c | rng next ].
omd := OrderedCollection new.
ttr := [
| column windowSize window sum |
column := m atColumn: 5.
windowSize := 500.
window := (column first: windowSize) asSortedCollection.
sum := window sum.
omd add: { sum. window first. window last. window median. sum /
windowSize }.
1 to: column size - windowSize do: [ :i |
| outgoing incoming |
outgoing := column at: i.
incoming := column at: i + windowSize.
sum := sum - outgoing + incoming.
window
remove: outgoing;
omd add: { sum. window first. window last. window median.
sum / windowSize } ] ] timeToRun.
ttr

My result for this code is 335ms (and it calculates the median too). It
could be improved with a special method but that's out of the scope of
this mail. :)

Calculating the weighted averages with the same parameters (6333 data, 500
window size) with the FFT convolution described above took 350ms.

I hope you'll find these methods useful.

Levente

>
> 594ms is quite an improvement over the original 6200ms. The original without
> median is now 2704.
> Numpy is still faster over the same data set and doing the same thing it is
> at 302ms. But Numpy is optimized C/Fortran with a Python face. It is a nice
> API but still dependent on someone doing the hard work on the  back end. The
> nice thing with the Squeak code is that it is reasonably performing and all
> Smalltalk. :)
>
> Thanks for the education.
>
> Jimmie
>
>