 ## [squeak-dev] Matrix Performance - was: Interesting survey about smalltalk

Jimmie Houchin jdev at cyberhaus.us
Thu Jun 24 20:20:36 UTC 2010

```On 6/21/2010 5:34 PM, Levente Uzonyi wrote:
> On Mon, 21 Jun 2010, Jimmie Houchin wrote:
>> On 6/21/2010 9:27 AM, Levente Uzonyi wrote:
>>> On Sun, 20 Jun 2010, Jimmie Houchin wrote:
>>>> [snip]
>>>> Here is some samples from my experimentation.
>>>>
>>>> Some of what I am doing is doing rolling calculations over my dataset.
>>>>
>>>> dataset is one weeks worth of OHLC data of a currency pair.
>>>>
>>>> In Squeak I have.
>>>>
>>>> ttr := [
>>>>  1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin
>>>> rowMedian rowAverage |
>>>>  row := (m atRows: i to: (499+i) columns: 5 to: 5).
>>>>  rowSum := row sum.
>>>>  rowMax := row max.
>>>>  rowMin := row min.
>>>>  rowMedian := row median.
>>>>  rowAverage := row average.
>>>>  omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]]
>>>> timeToRun.
>>>>
>>>> Squeak:  17 seconds,  with Cog 4.2 seconds  (nice work guys
>>>> (Eliot/Teleplace)
>>>
>>> This code can be implemented a lot more efficiently.
>>> #atRows:to:columns:to: creates a new matrix, but that can be avoided.
>>> #sum, #max, #min, #median and #average iterate over the row. What's
>>> worse, #median sorts the row. These can be elimated too.
>>> The total runtime cost is: O((r - w) * (w + w * log(w))), which is
>>> O(m * w * log(w)) if m >> w, which is true in your case. (r is the
>>> number of rows, w is the window size (500 in your example)).
>>> This can be reduced to m*log(w) which is 500x speedup (in theory,
>>> ignoring constatns) in your case.
>>>
>>> The idea is to store the intermediate results. The sum (and average)
>>> only require a single variable which stores the sum of the window.
>>> Then substract the element getting out of the window and add the new
>>> element getting in the window and you got the new sum and average.
>>> Min, max and median are a bit more tricky, but a balanced binary
>>> tree handle them. Calculating min and max in a balanced tree
>>> and removing 1-1 elements also require O(log(w)) time. For median,
>>> you have to find the node of the median of the first 500 elements at
>>> the beginning. When an element is removed or added to the tree, the
>>> median will be the same, the next or the previous element in the
>>> tree, depending on the median, the added/removed element and the
>>> size of the tree. This can be handled in O(log(w)) time.
>>>
>>> For a matrix with 10000 rows and 5 columns of random floats your
>>> code takes 5.5 seconds with Cog. Using a temp for sum and average
>>> and a tree for min and max (without the median, because it requires
>>> a special tree implementation) it takes ~35ms. That's 157x speedup.
>>>
>>> Levente
>>
>> Hello Levente,
>>
>> I am not surprised that I may not have the most efficient
>> implementation. I understand what you are saying in principle, but I
>> don't understand how to implement what you are saying. Can you
>> provide an example doing what I did in the manner you describe. I
>> would greatly appreciate it. I would then run it against my data for
>> a test.
>
> Hi Jimmie,
>
> if you load http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz  then
> you can evaluate the following example in a workspace. Just print it:
>
> | rng matrix windowSize columnIndex old new oldTime newTime |
> "Create a matrix filled with random float values between 0 and 1 with
> 10000 rows."
> rng := Random new.
> matrix := Matrix rows: 10000 columns: 5 tabulate: [ :i :j | rng next ].
> windowSize := 500.
> columnIndex := 5.
> "The 'old' slow computation."
> oldTime := [ old := Array new: matrix rowCount - windowSize
> streamContents: [ :stream |
>      1 to: matrix rowCount - windowSize + 1 do: [ :i |
>         | row |
>         row := (matrix atRows: i to: i + windowSize - 1 columns: 5 to:
> 5).
>         stream nextPut: {
>             row sum.
>             row max.
>             row min.
>             "row median."
>             row average } ] ] ] timeToRun.
> "The 'new' fast computation."
> newTime := [ new := Array new: matrix rowCount - windowSize
> streamContents: [ :stream |
>     | sum tree |
>     "Our variables holding the intermediate results."
>     sum := 0.
>     tree := LLRBTree new.
>     "Calculate the first values."
>     1 to: windowSize do: [ :rowIndex |
>         | element |
>         element := matrix at: rowIndex at: columnIndex.
>         sum := sum + element.
>         tree at: element put: element ].
>     stream nextPut: { sum. tree max. tree min. sum / windowSize }.
>     "And the rest."
>     windowSize + 1 to: matrix rowCount do: [ :rowIndex |
>         | oldElement newElement |
>         oldElement := matrix at: rowIndex - windowSize at: columnIndex.
>         newElement := matrix at: rowIndex at: columnIndex.
>         sum := sum - oldElement + newElement.
>         "Housekeeping for median would happen here"
>         tree
>             deleteKey: oldElement;
>             at: newElement put: newElement.
>         stream nextPut: { sum. tree max. tree min. sum / windowSize }
> ] ] ] timeToRun.
> "Make sure the calculations give the same results."
> old with: new do: [ :oldRow :newRow |
>     oldRow with: newRow do: [ :oldValue :newValue |
>         self assert: (oldValue closeTo: newValue) ] ].
> "Print it to see the runtime"
> { oldTime. newTime }
>
>
> The "old" version is your original code with minor changes, the "new"
> is the one that stores intermediate results. If you change the value
> of the matrix variable, you can test it on your data.
>
>> The above was simply an example. I have many more methods which I've
>> implemented which are doing a variety of moving averages and such. To
>> my understanding, Squeak doesn't have the library of statistical
>> methods at this time. That would be one nice thing that could be done
>> when Numpy becomes a C lib and can be interfaced to from Squeak.
>>
>> I appreciate your comment above. I would really like to see Squeak
>> out perform some of the alternatives. :)
>
> The same "trick" can be done in python, and I'm sure it has a
> sufficient balanced binary tree implementation for median too.
>
> Levente

I just recently had an opportunity to study your code to learn. I
haven't tried it yet, but while exploring and studying I realized if I
understand correctly a problem. Many/(most ?) of the uses of Matrix,
(Numpy), etc. the order of the data is a part of the data. It isn't
simply a set of numbers, but an ordered collection of numbers or data.
Whether it is weather statistics, etc., or in my case financials which
are associated to a precise point in time in the market.

To my understanding using a tree based implementation would lose order
of the data. I would imagine if you used keys which could be ordered and
then order applied, you would lose the performance gains.

In my simple example order had little meaning. But a common calculation
is a weighted average which does associate a weighted value according to
position in the array.

While playing this is the implementation that performed the best for me.
Going from 6200ms to 3600ms.

m := Matrix rows: 6333 columns: 162.
omd := OrderedCollection new.
ttr := [| column start windowSize |
column := m atColumn: 5.
start := 1.
windowSize := 500.
1 to: ((column size) - windowSize) do: [:i || row rowSum rowMax
rowMin rowAverage  rowMedian |
row := column copyFrom: i to: i+windowSize.
rowSum := row sum.
rowMax := row max.
rowMin := row min.
rowMedian := row median.
rowAverage := row average.
omd6 add: {rowSum . rowMax . rowMin . rowAverage}]] timeToRun.
ttr 3600

If I remove the median calculation  I get 594ms. You are quite correct,
the median calculation is quite expensive. However it is quite light to
the things I actually do. :)

594ms is quite an improvement over the original 6200ms. The original
without median is now 2704.
Numpy is still faster over the same data set and doing the same thing it
is at 302ms. But Numpy is optimized C/Fortran with a Python face. It is
a nice API but still dependent on someone doing the hard work on the
back end. The nice thing with the Squeak code is that it is reasonably
performing and all Smalltalk. :)

Thanks for the education.

Jimmie

```