[squeak-dev] Interesting survey about smalltalk

Mon Jun 21 14:27:08 UTC 2010

On Sun, 20 Jun 2010, Jimmie Houchin wrote:

> On 6/20/2010 10:41 AM, Lawson English wrote:
>> On 6/20/10 6:08 AM, Nicolas Cellier wrote:
>>> 2010/6/20 Michael Haupt<mhaupt at gmail.com>:
>>>> Hi Nicolas,
>>>> 
>>>> On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier
>>>> <nicolas.cellier.aka.nice at gmail.com>  wrote:
>>>>> About 8) :  True, every single operation results in memory allocation
>>>>> / garbage collection, a burden for number crunching.
>>>> really?
>>>> 
>>>> There is this nice book by Didier Besset called "Object-Oriented
>>>> Implementation of Numerical Methods. An Introduction with Java and
>>>> Smalltalk.: An Introduction with Java and Smalltalk". It can't be
>>>> *that* bad. :-)
>>> Agree, "not worse than Matlab" was the meaning of my message.
>>>>> My own answer was: use C/FORTRAN for optimized number crunching
>>>>> functions. Use Smalltalk for any higher level/GUI function (via
>>>>> DLLCC/FFI). We may have more than 1 hammer in your toolset!
>>>> With GPU connectivity things emerging, number crunching might even be
>>>> an interesting area for Smalltalk.
>>>> 
>>>> Best,
>>>> Michael
>>> Yes, this falls in vectorizing the operations.
>>> But I would go for a GPU-BLAS implementation available to any language
>>> (Smalltalk and C as well).
>>> 
>>> Nicolas
>> How many parallel squeak processes would be required to = the speed of one 
>> native library for arbitrary precision math, or for other math intensive 
>> purposes?
>> 
>> Lawson
>
> Hello,
>
> I would love to be using Squeak for my financial application. Numerical 
> performance isn't currently what is stopping me. My problem is that I require 
> interfacing with a Windows COM dll and in a future version with a Java 
> library. Hopefully at some point I will be able to port to Squeak. I would 
> much prefer it to using Python, which is what I am currently using.
>
> I didn't even know Squeak was in the running until I discovered the Matrix 
> class. And for what I need to do it performs reasonably adequately. However 
> Squeak does not to my knowledge have a comprehensive collection of 
> mathematics methods to be able to be applied to a variety of data. Currently 
> I am using Python and Numpy which has a nicely optimized 
> Mathematics/Scientific set of functions using optimized C/Fortran libraries. 
> I would love to see Squeak compete in this area. In fact the Numpy people are 
> currently refactoring the library to turn it into a C library usable by other 
> languages.
>
> Here is some samples from my experimentation.
>
> Some of what I am doing is doing rolling calculations over my dataset.
>
> dataset is one weeks worth of OHLC data of a currency pair.
>
> In Squeak I have.
>
> ttr := [
>  1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian 
> rowAverage |
>  row := (m atRows: i to: (499+i) columns: 5 to: 5).
>  rowSum := row sum.
>  rowMax := row max.
>  rowMin := row min.
>  rowMedian := row median.
>  rowAverage := row average.
>  omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
>
> Squeak:  17 seconds,  with Cog 4.2 seconds  (nice work guys (Eliot/Teleplace)

This code can be implemented a lot more efficiently. 
#atRows:to:columns:to: creates a new matrix, but that can be avoided.
#sum, #max, #min, #median and #average iterate over the row. What's 
worse, #median sorts the row. These can be elimated too.
The total runtime cost is: O((r - w) * (w + w * log(w))), which is 
O(m * w * log(w)) if m >> w, which is true in your case. (r is the number 
of rows, w is the window size (500 in your example)).
This can be reduced to m*log(w) which is 500x speedup (in theory, 
ignoring constatns) in your case.

The idea is to store the intermediate results. The sum (and average) only 
require a single variable which stores the sum of the window. Then 
substract the element getting out of the window and add the new element 
getting in the window and you got the new sum and average. Min, max and 
median are a bit more tricky, but a balanced binary tree handle them. 
Calculating min and max in a balanced tree requires O(log(n)) time (which 
is O(log(w)) in your case). Adding and removing 1-1 elements also require 
O(log(w)) time. For median, you have to find the node of the median of the 
first 500 elements at the beginning. When an element is removed or added 
to the tree, the median will be the same, the next or the previous element 
in the tree, depending on the median, the added/removed element and the 
size of the tree. This can be handled in O(log(w)) time.

For a matrix with 10000 rows and 5 columns of random floats your code 
takes 5.5 seconds with Cog. Using a temp for sum and average and a tree 
for min and max (without the median, because it requires a special tree 
implementation) it takes ~35ms. That's 157x speedup.

Levente

>
> In Python/Numpy I have.
>
> import numpy as np
> def speedtest(array,omd):
>    t1 = time.time()
>    for i in range(0, (len(a)-500)):
>        rowmax = np.max(a['bidclose'][i:i+500])
>        rowmin = np.min(a['bidclose'][i:i+500])
>        rowsum = np.sum(a['bidclose'][i:i+500])
>        rowmedian = np.median(a['bidclose'][i:i+500])
>        rowmean = np.mean(a['bidclose'][i:i+500])
>        omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean))
>    return time.time()-t1
>
> Python:  .7 seconds
>
> Python/Numpy performs well, is reasonably nice to work with. But I would give 
> up the performance to be able to use Squeak. The live environment and 
> debugging would be invaluable for experimentation.
>
> Hopefully this will give you some idea.
>
> Jimmie
>
>