http://therighttool.hammerprinciple.com/items/smalltalk
I think this could be used as a good guide for us to make an effort to move some low-ranked items into a high ranked ones :)
2010/6/20 Igor Stasenko siguctua@gmail.com:
http://therighttool.hammerprinciple.com/items/smalltalk
I think this could be used as a good guide for us to make an effort to move some low-ranked items into a high ranked ones :)
Oh, I first misunderstood the double negations in the right column. (i.e. it's not unusually bad for beginners).
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching. But iIf you think of it, the very same argument apply to Matlab, because the interpreter is awfully slow (last time I checked, about 5 times slower than 32 bits VW !). So the well known answer is to provide a good library to vectorize operations (kind of BLAS + LAPACK). There is also the possibility to reduce memory allocation pressure by using immediate values (like VW SmallDouble). It will never compete with C/FORTRAN, but user feeling may vary a bit on 64bit architectures.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
Nicolas
-- Best regards, Igor Stasenko AKA sig.
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best,
Michael
2010/6/20 Michael Haupt mhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best,
Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best,
Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best, Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
I started Smallapack for this purpose. Smallapack interfaces BLAS+LAPACK thru FFI. Very much like numpy...
Smallapack works quite well in VW and Dolphin... (In fact it did work very well since 1990 with st80 user primitives...) ...Unfirtunately not so in Squeak (VM crach possibly).
I'd like to put more time in it, but so far there has not been so much interest.
Nicolas
2010/6/21 Jimmie Houchin jdev@cyberhaus.us:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best, Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
nicolas,
i will be interested to test if a port for squeak is available
arul
On Mon, Jun 21, 2010 at 12:41 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
I started Smallapack for this purpose. Smallapack interfaces BLAS+LAPACK thru FFI. Very much like numpy...
Smallapack works quite well in VW and Dolphin... (In fact it did work very well since 1990 with st80 user primitives...) ...Unfirtunately not so in Squeak (VM crach possibly).
I'd like to put more time in it, but so far there has not been so much interest.
Nicolas
2010/6/21 Jimmie Houchin jdev@cyberhaus.us:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best, Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
It is available in squeaksource.
However, there is a compiler hack to be installed before loading the package, in order to handle function call with more than 15 arguments.
This hack is to be updated because of numerous Compiler changes related to closure. I have some unpublished update.
Which version of Squeak/pharo do you use ?
Nicolas
2010/6/21 arul selvan arul.selvan@gmail.com:
nicolas,
i will be interested to test if a port for squeak is available
arul
On Mon, Jun 21, 2010 at 12:41 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
I started Smallapack for this purpose. Smallapack interfaces BLAS+LAPACK thru FFI. Very much like numpy...
Smallapack works quite well in VW and Dolphin... (In fact it did work very well since 1990 with st80 user primitives...) ...Unfirtunately not so in Squeak (VM crach possibly).
I'd like to put more time in it, but so far there has not been so much interest.
Nicolas
2010/6/21 Jimmie Houchin jdev@cyberhaus.us:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote: > > About 8) : True, every single operation results in memory allocation > / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
> > My own answer was: use C/FORTRAN for optimized number crunching > functions. Use Smalltalk for any higher level/GUI function (via > DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best, Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
Which version of Squeak
i used 3.8 and moved out of squeak for a couple of years to work in matlab. so any version >3.8 is ok with me for testing.
arul
On Mon, Jun 21, 2010 at 5:47 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
It is available in squeaksource.
However, there is a compiler hack to be installed before loading the package, in order to handle function call with more than 15 arguments.
This hack is to be updated because of numerous Compiler changes related to closure. I have some unpublished update.
Which version of Squeak/pharo do you use ?
Nicolas
2010/6/21 arul selvan arul.selvan@gmail.com:
nicolas,
Compiler hack is available for Squeak 3.9 at http://bugs.squeak.org/view.php?id=2918 Load the 3 .cs in indicated order.
I will publish an update for latest trunk at the same place when I'll have a bit of time...
Then the latest Smallapack code is at http://www.squeaksource.com/Smallapack.html
If you are on linux, you'd better use a relatively new VM correcting http://bugs.squeak.org/view.php?id=3929
Nicolas
2010/6/21 arul selvan arul.selvan@gmail.com:
nicolas,
i will be interested to test if a port for squeak is available
arul
On Mon, Jun 21, 2010 at 12:41 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
I started Smallapack for this purpose. Smallapack interfaces BLAS+LAPACK thru FFI. Very much like numpy...
Smallapack works quite well in VW and Dolphin... (In fact it did work very well since 1990 with st80 user primitives...) ...Unfirtunately not so in Squeak (VM crach possibly).
I'd like to put more time in it, but so far there has not been so much interest.
Nicolas
2010/6/21 Jimmie Houchin jdev@cyberhaus.us:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote: > > About 8) : True, every single operation results in memory allocation > / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
> > My own answer was: use C/FORTRAN for optimized number crunching > functions. Use Smalltalk for any higher level/GUI function (via > DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best, Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
2010/6/21 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Compiler hack is available for Squeak 3.9 at http://bugs.squeak.org/view.php?id=2918 Load the 3 .cs in indicated order.
I will publish an update for latest trunk at the same place when I'll have a bit of time...
Finally, I published a Compiler version for latest trunk in this SqueakSource repository Plus a hack or two to make external interface loadable in closure VM (there is an additional limitation in closure that is max number of block copied values<16)
Sometimes, it gets a bit tricky to initialize Smallapack. I did something like:
LapackMatrix initialize. LapackMatrix allSubclasses do: [:e | [e initializeClassInstVars] ifError: []].
Nicolas
Then the latest Smallapack code is at http://www.squeaksource.com/Smallapack.html
If you are on linux, you'd better use a relatively new VM correcting http://bugs.squeak.org/view.php?id=3929
Nicolas
2010/6/21 arul selvan arul.selvan@gmail.com:
nicolas,
i will be interested to test if a port for squeak is available
arul
On Mon, Jun 21, 2010 at 12:41 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
I started Smallapack for this purpose. Smallapack interfaces BLAS+LAPACK thru FFI. Very much like numpy...
Smallapack works quite well in VW and Dolphin... (In fact it did work very well since 1990 with st80 user primitives...) ...Unfirtunately not so in Squeak (VM crach possibly).
I'd like to put more time in it, but so far there has not been so much interest.
Nicolas
2010/6/21 Jimmie Houchin jdev@cyberhaus.us:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com: > > Hi Nicolas, > > On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier > nicolas.cellier.aka.nice@gmail.com wrote: >> >> About 8) : True, every single operation results in memory allocation >> / garbage collection, a burden for number crunching. > > really? > > There is this nice book by Didier Besset called "Object-Oriented > Implementation of Numerical Methods. An Introduction with Java and > Smalltalk.: An Introduction with Java and Smalltalk". It can't be > *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message. >> >> My own answer was: use C/FORTRAN for optimized number crunching >> functions. Use Smalltalk for any higher level/GUI function (via >> DLLCC/FFI). We may have more than 1 hammer in your toolset! > > With GPU connectivity things emerging, number crunching might even be > an interesting area for Smalltalk. > > Best, > Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
Hi Nicolas,
On Mon, Jun 21, 2010 at 2:59 PM, Nicolas Cellier < nicolas.cellier.aka.nice@gmail.com> wrote:
2010/6/21 Nicolas Cellier nicolas.cellier.aka.nice@gmail.com:
Compiler hack is available for Squeak 3.9 at http://bugs.squeak.org/view.php?id=2918 Load the 3 .cs in indicated order.
I will publish an update for latest trunk at the same place when I'll have a bit of time...
Finally, I published a Compiler version for latest trunk in this SqueakSource repository Plus a hack or two to make external interface loadable in closure VM (there is an additional limitation in closure that is max number of block copied values<16)
I'm interested to see the code that exceeds this limitation. Hopefully it'll be temporary, but it'll take a bytecode redesign to overcome, so I can't fix this anytime soon. Sorry it has bitten you!
Sometimes, it gets a bit tricky to initialize Smallapack. I did something like:
LapackMatrix initialize. LapackMatrix allSubclasses do: [:e | [e initializeClassInstVars] ifError: []].
Nicolas
Then the latest Smallapack code is at http://www.squeaksource.com/Smallapack.html
If you are on linux, you'd better use a relatively new VM correcting http://bugs.squeak.org/view.php?id=3929
Nicolas
2010/6/21 arul selvan arul.selvan@gmail.com:
nicolas,
i will be interested to test if a port for squeak is available
arul
On Mon, Jun 21, 2010 at 12:41 PM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
I started Smallapack for this purpose. Smallapack interfaces BLAS+LAPACK thru FFI. Very much like numpy...
Smallapack works quite well in VW and Dolphin... (In fact it did work very well since 1990 with st80 user primitives...) ...Unfirtunately not so in Squeak (VM crach possibly).
I'd like to put more time in it, but so far there has not been so much
interest.
Nicolas
2010/6/21 Jimmie Houchin jdev@cyberhaus.us:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote: > > 2010/6/20 Michael Hauptmhaupt@gmail.com: >> >> Hi Nicolas, >> >> On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier >> nicolas.cellier.aka.nice@gmail.com wrote: >>> >>> About 8) : True, every single operation results in memory
allocation
>>> / garbage collection, a burden for number crunching. >> >> really? >> >> There is this nice book by Didier Besset called "Object-Oriented >> Implementation of Numerical Methods. An Introduction with Java and >> Smalltalk.: An Introduction with Java and Smalltalk". It can't be >> *that* bad. :-) > > Agree, "not worse than Matlab" was the meaning of my message. >>> >>> My own answer was: use C/FORTRAN for optimized number crunching >>> functions. Use Smalltalk for any higher level/GUI function (via >>> DLLCC/FFI). We may have more than 1 hammer in your toolset! >> >> With GPU connectivity things emerging, number crunching might even
be
>> an interesting area for Smalltalk. >> >> Best, >> Michael > > Yes, this falls in vectorizing the operations. > But I would go for a GPU-BLAS implementation available to any
language
> (Smalltalk and C as well). > > Nicolas
How many parallel squeak processes would be required to = the speed
of one
native library for arbitrary precision math, or for other math
intensive
purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application.
Numerical
performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version
with a
Java library. Hopefully at some point I will be able to port to
Squeak. I
would much prefer it to using Python, which is what I am currently
using.
I didn't even know Squeak was in the running until I discovered the
Matrix
class. And for what I need to do it performs reasonably adequately.
However
Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data.
Currently
I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran
libraries.
I would love to see Squeak compete in this area. In fact the Numpy
people
are currently refactoring the library to turn it into a C library
usable by
other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin
rowMedian
rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]]
timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I
would
give up the performance to be able to use Squeak. The live environment
and
debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
On Sun, 20 Jun 2010, Jimmie Houchin wrote:
On 6/20/2010 10:41 AM, Lawson English wrote:
On 6/20/10 6:08 AM, Nicolas Cellier wrote:
2010/6/20 Michael Hauptmhaupt@gmail.com:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
Agree, "not worse than Matlab" was the meaning of my message.
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Best, Michael
Yes, this falls in vectorizing the operations. But I would go for a GPU-BLAS implementation available to any language (Smalltalk and C as well).
Nicolas
How many parallel squeak processes would be required to = the speed of one native library for arbitrary precision math, or for other math intensive purposes?
Lawson
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
This code can be implemented a lot more efficiently. #atRows:to:columns:to: creates a new matrix, but that can be avoided. #sum, #max, #min, #median and #average iterate over the row. What's worse, #median sorts the row. These can be elimated too. The total runtime cost is: O((r - w) * (w + w * log(w))), which is O(m * w * log(w)) if m >> w, which is true in your case. (r is the number of rows, w is the window size (500 in your example)). This can be reduced to m*log(w) which is 500x speedup (in theory, ignoring constatns) in your case.
The idea is to store the intermediate results. The sum (and average) only require a single variable which stores the sum of the window. Then substract the element getting out of the window and add the new element getting in the window and you got the new sum and average. Min, max and median are a bit more tricky, but a balanced binary tree handle them. Calculating min and max in a balanced tree requires O(log(n)) time (which is O(log(w)) in your case). Adding and removing 1-1 elements also require O(log(w)) time. For median, you have to find the node of the median of the first 500 elements at the beginning. When an element is removed or added to the tree, the median will be the same, the next or the previous element in the tree, depending on the median, the added/removed element and the size of the tree. This can be handled in O(log(w)) time.
For a matrix with 10000 rows and 5 columns of random floats your code takes 5.5 seconds with Cog. Using a temp for sum and average and a tree for min and max (without the median, because it requires a special tree implementation) it takes ~35ms. That's 157x speedup.
Levente
In Python/Numpy I have.
import numpy as np def speedtest(array,omd): t1 = time.time() for i in range(0, (len(a)-500)): rowmax = np.max(a['bidclose'][i:i+500]) rowmin = np.min(a['bidclose'][i:i+500]) rowsum = np.sum(a['bidclose'][i:i+500]) rowmedian = np.median(a['bidclose'][i:i+500]) rowmean = np.mean(a['bidclose'][i:i+500]) omd.append((rowsum, rowmax, rowmin, rowmedian, rowmean)) return time.time()-t1
Python: .7 seconds
Python/Numpy performs well, is reasonably nice to work with. But I would give up the performance to be able to use Squeak. The live environment and debugging would be invaluable for experimentation.
Hopefully this will give you some idea.
Jimmie
On 6/21/2010 9:27 AM, Levente Uzonyi wrote:
On Sun, 20 Jun 2010, Jimmie Houchin wrote:
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
This code can be implemented a lot more efficiently. #atRows:to:columns:to: creates a new matrix, but that can be avoided. #sum, #max, #min, #median and #average iterate over the row. What's worse, #median sorts the row. These can be elimated too. The total runtime cost is: O((r - w) * (w + w * log(w))), which is O(m
- w * log(w)) if m >> w, which is true in your case. (r is the number
of rows, w is the window size (500 in your example)). This can be reduced to m*log(w) which is 500x speedup (in theory, ignoring constatns) in your case.
The idea is to store the intermediate results. The sum (and average) only require a single variable which stores the sum of the window. Then substract the element getting out of the window and add the new element getting in the window and you got the new sum and average. Min, max and median are a bit more tricky, but a balanced binary tree handle them. Calculating min and max in a balanced tree requires O(log(n)) time (which is O(log(w)) in your case). Adding and removing 1-1 elements also require O(log(w)) time. For median, you have to find the node of the median of the first 500 elements at the beginning. When an element is removed or added to the tree, the median will be the same, the next or the previous element in the tree, depending on the median, the added/removed element and the size of the tree. This can be handled in O(log(w)) time.
For a matrix with 10000 rows and 5 columns of random floats your code takes 5.5 seconds with Cog. Using a temp for sum and average and a tree for min and max (without the median, because it requires a special tree implementation) it takes ~35ms. That's 157x speedup.
Levente
Hello Levente,
I am not surprised that I may not have the most efficient implementation. I understand what you are saying in principle, but I don't understand how to implement what you are saying. Can you provide an example doing what I did in the manner you describe. I would greatly appreciate it. I would then run it against my data for a test.
The above was simply an example. I have many more methods which I've implemented which are doing a variety of moving averages and such. To my understanding, Squeak doesn't have the library of statistical methods at this time. That would be one nice thing that could be done when Numpy becomes a C lib and can be interfaced to from Squeak.
I appreciate your comment above. I would really like to see Squeak out perform some of the alternatives. :)
Thanks.
Jimmie
On Mon, 21 Jun 2010, Jimmie Houchin wrote:
On 6/21/2010 9:27 AM, Levente Uzonyi wrote:
On Sun, 20 Jun 2010, Jimmie Houchin wrote:
Hello,
I would love to be using Squeak for my financial application. Numerical performance isn't currently what is stopping me. My problem is that I require interfacing with a Windows COM dll and in a future version with a Java library. Hopefully at some point I will be able to port to Squeak. I would much prefer it to using Python, which is what I am currently using.
I didn't even know Squeak was in the running until I discovered the Matrix class. And for what I need to do it performs reasonably adequately. However Squeak does not to my knowledge have a comprehensive collection of mathematics methods to be able to be applied to a variety of data. Currently I am using Python and Numpy which has a nicely optimized Mathematics/Scientific set of functions using optimized C/Fortran libraries. I would love to see Squeak compete in this area. In fact the Numpy people are currently refactoring the library to turn it into a C library usable by other languages.
Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
This code can be implemented a lot more efficiently. #atRows:to:columns:to: creates a new matrix, but that can be avoided. #sum, #max, #min, #median and #average iterate over the row. What's worse, #median sorts the row. These can be elimated too. The total runtime cost is: O((r - w) * (w + w * log(w))), which is O(m * w
- log(w)) if m >> w, which is true in your case. (r is the number of rows,
w is the window size (500 in your example)). This can be reduced to m*log(w) which is 500x speedup (in theory, ignoring constatns) in your case.
The idea is to store the intermediate results. The sum (and average) only require a single variable which stores the sum of the window. Then substract the element getting out of the window and add the new element getting in the window and you got the new sum and average. Min, max and median are a bit more tricky, but a balanced binary tree handle them. Calculating min and max in a balanced tree requires O(log(n)) time (which is O(log(w)) in your case). Adding and removing 1-1 elements also require O(log(w)) time. For median, you have to find the node of the median of the first 500 elements at the beginning. When an element is removed or added to the tree, the median will be the same, the next or the previous element in the tree, depending on the median, the added/removed element and the size of the tree. This can be handled in O(log(w)) time.
For a matrix with 10000 rows and 5 columns of random floats your code takes 5.5 seconds with Cog. Using a temp for sum and average and a tree for min and max (without the median, because it requires a special tree implementation) it takes ~35ms. That's 157x speedup.
Levente
Hello Levente,
I am not surprised that I may not have the most efficient implementation. I understand what you are saying in principle, but I don't understand how to implement what you are saying. Can you provide an example doing what I did in the manner you describe. I would greatly appreciate it. I would then run it against my data for a test.
Hi Jimmie,
if you load http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz [1] then you can evaluate the following example in a workspace. Just print it:
| rng matrix windowSize columnIndex old new oldTime newTime | "Create a matrix filled with random float values between 0 and 1 with 10000 rows." rng := Random new. matrix := Matrix rows: 10000 columns: 5 tabulate: [ :i :j | rng next ]. windowSize := 500. columnIndex := 5. "The 'old' slow computation." oldTime := [ old := Array new: matrix rowCount - windowSize streamContents: [ :stream | 1 to: matrix rowCount - windowSize + 1 do: [ :i | | row | row := (matrix atRows: i to: i + windowSize - 1 columns: 5 to: 5). stream nextPut: { row sum. row max. row min. "row median." row average } ] ] ] timeToRun. "The 'new' fast computation." newTime := [ new := Array new: matrix rowCount - windowSize streamContents: [ :stream | | sum tree | "Our variables holding the intermediate results." sum := 0. tree := LLRBTree new. "Calculate the first values." 1 to: windowSize do: [ :rowIndex | | element | element := matrix at: rowIndex at: columnIndex. sum := sum + element. tree at: element put: element ]. stream nextPut: { sum. tree max. tree min. sum / windowSize }. "And the rest." windowSize + 1 to: matrix rowCount do: [ :rowIndex | | oldElement newElement | oldElement := matrix at: rowIndex - windowSize at: columnIndex. newElement := matrix at: rowIndex at: columnIndex. sum := sum - oldElement + newElement. "Housekeeping for median would happen here" tree deleteKey: oldElement; at: newElement put: newElement. stream nextPut: { sum. tree max. tree min. sum / windowSize } ] ] ] timeToRun. "Make sure the calculations give the same results." old with: new do: [ :oldRow :newRow | oldRow with: newRow do: [ :oldValue :newValue | self assert: (oldValue closeTo: newValue) ] ]. "Print it to see the runtime" { oldTime. newTime }
The "old" version is your original code with minor changes, the "new" is the one that stores intermediate results. If you change the value of the matrix variable, you can test it on your data.
The above was simply an example. I have many more methods which I've implemented which are doing a variety of moving averages and such. To my understanding, Squeak doesn't have the library of statistical methods at this time. That would be one nice thing that could be done when Numpy becomes a C lib and can be interfaced to from Squeak.
I appreciate your comment above. I would really like to see Squeak out perform some of the alternatives. :)
The same "trick" can be done in python, and I'm sure it has a sufficient balanced binary tree implementation for median too.
Levente
[1] The LLRBT package contains an implementation of Robert Sedgewick's Left-Leaning Red-Black Trees. The implementation lacks parent pointers for simplicity (and easy persistence adoption) which makes it a bit harder (but not impossible) to find the next and previous values of a node in O(log(n)) time. Also LLRBTree implements a dictionary-like protocol, so a tree node stores key-value pairs. Because of this, processing data with repeating values may result in data loss (since the tree can't hold the same key more than once). If someone has a free (and complete) AVL or Red-Black tree implementation in Squeak, let me know. :)
Thanks.
Jimmie
On 6/21/2010 5:34 PM, Levente Uzonyi wrote:
On Mon, 21 Jun 2010, Jimmie Houchin wrote:
On 6/21/2010 9:27 AM, Levente Uzonyi wrote:
On Sun, 20 Jun 2010, Jimmie Houchin wrote:
[snip] Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
This code can be implemented a lot more efficiently. #atRows:to:columns:to: creates a new matrix, but that can be avoided. #sum, #max, #min, #median and #average iterate over the row. What's worse, #median sorts the row. These can be elimated too. The total runtime cost is: O((r - w) * (w + w * log(w))), which is O(m * w * log(w)) if m >> w, which is true in your case. (r is the number of rows, w is the window size (500 in your example)). This can be reduced to m*log(w) which is 500x speedup (in theory, ignoring constatns) in your case.
The idea is to store the intermediate results. The sum (and average) only require a single variable which stores the sum of the window. Then substract the element getting out of the window and add the new element getting in the window and you got the new sum and average. Min, max and median are a bit more tricky, but a balanced binary tree handle them. Calculating min and max in a balanced tree requires O(log(n)) time (which is O(log(w)) in your case). Adding and removing 1-1 elements also require O(log(w)) time. For median, you have to find the node of the median of the first 500 elements at the beginning. When an element is removed or added to the tree, the median will be the same, the next or the previous element in the tree, depending on the median, the added/removed element and the size of the tree. This can be handled in O(log(w)) time.
For a matrix with 10000 rows and 5 columns of random floats your code takes 5.5 seconds with Cog. Using a temp for sum and average and a tree for min and max (without the median, because it requires a special tree implementation) it takes ~35ms. That's 157x speedup.
Levente
Hello Levente,
I am not surprised that I may not have the most efficient implementation. I understand what you are saying in principle, but I don't understand how to implement what you are saying. Can you provide an example doing what I did in the manner you describe. I would greatly appreciate it. I would then run it against my data for a test.
Hi Jimmie,
if you load http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz [1] then you can evaluate the following example in a workspace. Just print it:
| rng matrix windowSize columnIndex old new oldTime newTime | "Create a matrix filled with random float values between 0 and 1 with 10000 rows." rng := Random new. matrix := Matrix rows: 10000 columns: 5 tabulate: [ :i :j | rng next ]. windowSize := 500. columnIndex := 5. "The 'old' slow computation." oldTime := [ old := Array new: matrix rowCount - windowSize streamContents: [ :stream | 1 to: matrix rowCount - windowSize + 1 do: [ :i | | row | row := (matrix atRows: i to: i + windowSize - 1 columns: 5 to: 5). stream nextPut: { row sum. row max. row min. "row median." row average } ] ] ] timeToRun. "The 'new' fast computation." newTime := [ new := Array new: matrix rowCount - windowSize streamContents: [ :stream | | sum tree | "Our variables holding the intermediate results." sum := 0. tree := LLRBTree new. "Calculate the first values." 1 to: windowSize do: [ :rowIndex | | element | element := matrix at: rowIndex at: columnIndex. sum := sum + element. tree at: element put: element ]. stream nextPut: { sum. tree max. tree min. sum / windowSize }. "And the rest." windowSize + 1 to: matrix rowCount do: [ :rowIndex | | oldElement newElement | oldElement := matrix at: rowIndex - windowSize at: columnIndex. newElement := matrix at: rowIndex at: columnIndex. sum := sum - oldElement + newElement. "Housekeeping for median would happen here" tree deleteKey: oldElement; at: newElement put: newElement. stream nextPut: { sum. tree max. tree min. sum / windowSize } ] ] ] timeToRun. "Make sure the calculations give the same results." old with: new do: [ :oldRow :newRow | oldRow with: newRow do: [ :oldValue :newValue | self assert: (oldValue closeTo: newValue) ] ]. "Print it to see the runtime" { oldTime. newTime }
The "old" version is your original code with minor changes, the "new" is the one that stores intermediate results. If you change the value of the matrix variable, you can test it on your data.
The above was simply an example. I have many more methods which I've implemented which are doing a variety of moving averages and such. To my understanding, Squeak doesn't have the library of statistical methods at this time. That would be one nice thing that could be done when Numpy becomes a C lib and can be interfaced to from Squeak.
I appreciate your comment above. I would really like to see Squeak out perform some of the alternatives. :)
The same "trick" can be done in python, and I'm sure it has a sufficient balanced binary tree implementation for median too.
Levente
I just recently had an opportunity to study your code to learn. I haven't tried it yet, but while exploring and studying I realized if I understand correctly a problem. Many/(most ?) of the uses of Matrix, (Numpy), etc. the order of the data is a part of the data. It isn't simply a set of numbers, but an ordered collection of numbers or data. Whether it is weather statistics, etc., or in my case financials which are associated to a precise point in time in the market.
To my understanding using a tree based implementation would lose order of the data. I would imagine if you used keys which could be ordered and then order applied, you would lose the performance gains.
In my simple example order had little meaning. But a common calculation is a weighted average which does associate a weighted value according to position in the array.
While playing this is the implementation that performed the best for me. Going from 6200ms to 3600ms.
m := Matrix rows: 6333 columns: 162. omd := OrderedCollection new. ttr := [| column start windowSize | column := m atColumn: 5. start := 1. windowSize := 500. 1 to: ((column size) - windowSize) do: [:i || row rowSum rowMax rowMin rowAverage rowMedian | row := column copyFrom: i to: i+windowSize. rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd6 add: {rowSum . rowMax . rowMin . rowAverage}]] timeToRun. ttr 3600
If I remove the median calculation I get 594ms. You are quite correct, the median calculation is quite expensive. However it is quite light to the things I actually do. :)
594ms is quite an improvement over the original 6200ms. The original without median is now 2704. Numpy is still faster over the same data set and doing the same thing it is at 302ms. But Numpy is optimized C/Fortran with a Python face. It is a nice API but still dependent on someone doing the hard work on the back end. The nice thing with the Squeak code is that it is reasonably performing and all Smalltalk. :)
Thanks for the education.
Jimmie
2010/6/24 Jimmie Houchin jdev@cyberhaus.us:
On 6/21/2010 5:34 PM, Levente Uzonyi wrote:
On Mon, 21 Jun 2010, Jimmie Houchin wrote:
On 6/21/2010 9:27 AM, Levente Uzonyi wrote:
On Sun, 20 Jun 2010, Jimmie Houchin wrote:
[snip] Here is some samples from my experimentation.
Some of what I am doing is doing rolling calculations over my dataset.
dataset is one weeks worth of OHLC data of a currency pair.
In Squeak I have.
ttr := [ 1 to: ((m rowCount) -500) do: [:i || row rowSum rowMax rowMin rowMedian rowAverage | row := (m atRows: i to: (499+i) columns: 5 to: 5). rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd add: {rowSum . rowMax . rowMin . rowMedian . rowAverage}]] timeToRun.
Squeak: 17 seconds, with Cog 4.2 seconds (nice work guys (Eliot/Teleplace)
This code can be implemented a lot more efficiently. #atRows:to:columns:to: creates a new matrix, but that can be avoided. #sum, #max, #min, #median and #average iterate over the row. What's worse, #median sorts the row. These can be elimated too. The total runtime cost is: O((r - w) * (w + w * log(w))), which is O(m * w * log(w)) if m >> w, which is true in your case. (r is the number of rows, w is the window size (500 in your example)). This can be reduced to m*log(w) which is 500x speedup (in theory, ignoring constatns) in your case.
The idea is to store the intermediate results. The sum (and average) only require a single variable which stores the sum of the window. Then substract the element getting out of the window and add the new element getting in the window and you got the new sum and average. Min, max and median are a bit more tricky, but a balanced binary tree handle them. Calculating min and max in a balanced tree requires O(log(n)) time (which is O(log(w)) in your case). Adding and removing 1-1 elements also require O(log(w)) time. For median, you have to find the node of the median of the first 500 elements at the beginning. When an element is removed or added to the tree, the median will be the same, the next or the previous element in the tree, depending on the median, the added/removed element and the size of the tree. This can be handled in O(log(w)) time.
For a matrix with 10000 rows and 5 columns of random floats your code takes 5.5 seconds with Cog. Using a temp for sum and average and a tree for min and max (without the median, because it requires a special tree implementation) it takes ~35ms. That's 157x speedup.
Levente
Hello Levente,
I am not surprised that I may not have the most efficient implementation. I understand what you are saying in principle, but I don't understand how to implement what you are saying. Can you provide an example doing what I did in the manner you describe. I would greatly appreciate it. I would then run it against my data for a test.
Hi Jimmie,
if you load http://leves.web.elte.hu/squeak/LLRBT-ul.7.mcz [1] then you can evaluate the following example in a workspace. Just print it:
| rng matrix windowSize columnIndex old new oldTime newTime | "Create a matrix filled with random float values between 0 and 1 with 10000 rows." rng := Random new. matrix := Matrix rows: 10000 columns: 5 tabulate: [ :i :j | rng next ]. windowSize := 500. columnIndex := 5. "The 'old' slow computation." oldTime := [ old := Array new: matrix rowCount - windowSize streamContents: [ :stream | 1 to: matrix rowCount - windowSize + 1 do: [ :i | | row | row := (matrix atRows: i to: i + windowSize - 1 columns: 5 to: 5). stream nextPut: { row sum. row max. row min. "row median." row average } ] ] ] timeToRun. "The 'new' fast computation." newTime := [ new := Array new: matrix rowCount - windowSize streamContents: [ :stream | | sum tree | "Our variables holding the intermediate results." sum := 0. tree := LLRBTree new. "Calculate the first values." 1 to: windowSize do: [ :rowIndex | | element | element := matrix at: rowIndex at: columnIndex. sum := sum + element. tree at: element put: element ]. stream nextPut: { sum. tree max. tree min. sum / windowSize }. "And the rest." windowSize + 1 to: matrix rowCount do: [ :rowIndex | | oldElement newElement | oldElement := matrix at: rowIndex - windowSize at: columnIndex. newElement := matrix at: rowIndex at: columnIndex. sum := sum - oldElement + newElement. "Housekeeping for median would happen here" tree deleteKey: oldElement; at: newElement put: newElement. stream nextPut: { sum. tree max. tree min. sum / windowSize } ] ] ] timeToRun. "Make sure the calculations give the same results." old with: new do: [ :oldRow :newRow | oldRow with: newRow do: [ :oldValue :newValue | self assert: (oldValue closeTo: newValue) ] ]. "Print it to see the runtime" { oldTime. newTime }
The "old" version is your original code with minor changes, the "new" is the one that stores intermediate results. If you change the value of the matrix variable, you can test it on your data.
The above was simply an example. I have many more methods which I've implemented which are doing a variety of moving averages and such. To my understanding, Squeak doesn't have the library of statistical methods at this time. That would be one nice thing that could be done when Numpy becomes a C lib and can be interfaced to from Squeak.
I appreciate your comment above. I would really like to see Squeak out perform some of the alternatives. :)
The same "trick" can be done in python, and I'm sure it has a sufficient balanced binary tree implementation for median too.
Levente
I just recently had an opportunity to study your code to learn. I haven't tried it yet, but while exploring and studying I realized if I understand correctly a problem. Many/(most ?) of the uses of Matrix, (Numpy), etc. the order of the data is a part of the data. It isn't simply a set of numbers, but an ordered collection of numbers or data. Whether it is weather statistics, etc., or in my case financials which are associated to a precise point in time in the market.
To my understanding using a tree based implementation would lose order of the data. I would imagine if you used keys which could be ordered and then order applied, you would lose the performance gains.
In my simple example order had little meaning. But a common calculation is a weighted average which does associate a weighted value according to position in the array.
While playing this is the implementation that performed the best for me. Going from 6200ms to 3600ms.
m := Matrix rows: 6333 columns: 162. omd := OrderedCollection new. ttr := [| column start windowSize | column := m atColumn: 5. start := 1. windowSize := 500. 1 to: ((column size) - windowSize) do: [:i || row rowSum rowMax rowMin rowAverage rowMedian | row := column copyFrom: i to: i+windowSize. rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd6 add: {rowSum . rowMax . rowMin . rowAverage}]] timeToRun. ttr 3600
If I remove the median calculation I get 594ms. You are quite correct, the median calculation is quite expensive. However it is quite light to the things I actually do. :)
594ms is quite an improvement over the original 6200ms. The original without median is now 2704. Numpy is still faster over the same data set and doing the same thing it is at 302ms. But Numpy is optimized C/Fortran with a Python face. It is a nice API but still dependent on someone doing the hard work on the back end. The nice thing with the Squeak code is that it is reasonably performing and all Smalltalk. :)
For toying, educating, or implementing a few utilities maybe. But for main algorithms (eigen decomposition, singular values etc..), I much prefer a known to work solution, taking care both of numerical stability and space/time efficiency, and scaling to high dimensions whenever needed. Redoing LAPACK is a HUGE work. The probability to fail is high. Do you think matlab, scilab, octave, R-lab, etc... waste their time writing a LAPACK clone ? I wouldn't.
Plus you must consider that using an optimized BLAS is the state of the art in number crunching, maybe 3 orders of magnitude of efficiency above a Smalltalk written loop. Even with an optimized future COG it will still be 2... The idea of BLAS is to get high efficiency in low level computations if you can vector them. It's a perfect solution for implementing higher level structures in Smalltalk while preserving decent efficiency. Also, Scalapack gives oppurtunity to profit by parallel architectures for distributing load in case of large amount of computations... and I trust BLAS will exploit GPU features when enough standardization will be reached.
Sometimes, we just should open our eyes and take a look at external world We don't have to throw away any code not written in Smalltalk. If someone did some really hard work for free, then let's just grab it ! Let's be as clever as Python. Let's open our closed world. Let's promote FFI. that would promote Smalltalk solutions more than a 100% Smalltalk application would.
Well, just my POV :)
Nicolas
Thanks for the education.
Jimmie
On Thu, 24 Jun 2010, Jimmie Houchin wrote:
<snip>
I just recently had an opportunity to study your code to learn. I haven't tried it yet, but while exploring and studying I realized if I understand correctly a problem. Many/(most ?) of the uses of Matrix, (Numpy), etc. the order of the data is a part of the data. It isn't simply a set of numbers, but an ordered collection of numbers or data. Whether it is weather statistics, etc., or in my case financials which are associated to a precise point in time in the market.
To my understanding using a tree based implementation would lose order of the data. I would imagine if you used keys which could be ordered and then order applied, you would lose the performance gains.
The binary tree is only useful for the fast calculation of min max and median. If the ordering of the data matters for a calculation, then you need some other trick.
In my simple example order had little meaning. But a common calculation is a weighted average which does associate a weighted value according to position in the array.
If the weights are exponential, you can use the same trick as with sum. If the weights are not exponential, you can use convolution with FFI for O((n+w)*log(n+w)) performance which is O(n*log(n)) if w << n. Here is an example:
| data windowWithWeights fftSize circularData circularWeights fft cdtr cdti cwtr cwti | data := #(1 2 3 4 5) asFloatArray. "Our time series." windowWithWeights := #(4 2 1) asFloatArray. "Most recent data with weight 4, least recent data with weight 1" "Prepare for the FFT" fftSize := (data size + windowWithWeights size) asLargerPowerOfTwo. circularData := data copyGrownBy: fftSize - data size. circularWeights := windowWithWeights copyGrownBy: fftSize - windowWithWeights size. fft := FFT new: fftSize. "Transform the data and store the results" fft realData: circularData. fft transformForward: true. cdtr := fft realData. cdti := fft imagData. "Transform the weights and store the results" fft realData: circularWeights. fft transformForward: true. cwtr := fft realData. cwti := fft imagData. "Do the convolution which is simple complex multiplication, then the IFFT." fft realData: cdtr * cwtr - (cdti * cwti) imagData: cdtr * cwti + (cdti * cwtr). fft transformForward: false. "Extract data" fft realData copyFrom: windowWithWeights size to: data size. "a FloatArray(17.0 24.0 31.0) 17 = 4 * 3 + 2 * 2 + 1 * 1, 24 = 4 * 4 + 2 * 3 + 1 * 2, 31 = 4 * 5 + 2 * 4 + 1 * 3"
While playing this is the implementation that performed the best for me. Going from 6200ms to 3600ms.
m := Matrix rows: 6333 columns: 162. omd := OrderedCollection new. ttr := [| column start windowSize | column := m atColumn: 5. start := 1. windowSize := 500. 1 to: ((column size) - windowSize) do: [:i || row rowSum rowMax rowMin rowAverage rowMedian | row := column copyFrom: i to: i+windowSize. rowSum := row sum. rowMax := row max. rowMin := row min. rowMedian := row median. rowAverage := row average. omd6 add: {rowSum . rowMax . rowMin . rowAverage}]] timeToRun. ttr 3600
If I remove the median calculation I get 594ms. You are quite correct, the median calculation is quite expensive. However it is quite light to the things I actually do. :)
I found a simple way to improve the performance without using a tree or loading new packages: SortedCollection :). The least useful data structure in Squeak can shine here (though the tree-based implementation is >10 times faster in this case, but it's still pretty good):
rng := Random new. m := Matrix rows: 6333 columns: 162 tabulate: [ :r :c | rng next ]. omd := OrderedCollection new. ttr := [ | column windowSize window sum | column := m atColumn: 5. windowSize := 500. window := (column first: windowSize) asSortedCollection. sum := window sum. omd add: { sum. window first. window last. window median. sum / windowSize }. 1 to: column size - windowSize do: [ :i | | outgoing incoming | outgoing := column at: i. incoming := column at: i + windowSize. sum := sum - outgoing + incoming. window remove: outgoing; add: incoming. omd add: { sum. window first. window last. window median. sum / windowSize } ] ] timeToRun. ttr
My result for this code is 335ms (and it calculates the median too). It could be improved with a special method but that's out of the scope of this mail. :)
Calculating the weighted averages with the same parameters (6333 data, 500 window size) with the FFT convolution described above took 350ms.
I hope you'll find these methods useful.
Levente
594ms is quite an improvement over the original 6200ms. The original without median is now 2704. Numpy is still faster over the same data set and doing the same thing it is at 302ms. But Numpy is optimized C/Fortran with a Python face. It is a nice API but still dependent on someone doing the hard work on the back end. The nice thing with the Squeak code is that it is reasonably performing and all Smalltalk. :)
Thanks for the education.
Jimmie
Em 20-06-2010 10:00, Michael Haupt escreveu:
Hi Nicolas,
On Sun, Jun 20, 2010 at 11:17 AM, Nicolas Cellier nicolas.cellier.aka.nice@gmail.com wrote:
About 8) : True, every single operation results in memory allocation / garbage collection, a burden for number crunching.
really?
There is this nice book by Didier Besset called "Object-Oriented Implementation of Numerical Methods. An Introduction with Java and Smalltalk.: An Introduction with Java and Smalltalk". It can't be *that* bad. :-)
My own answer was: use C/FORTRAN for optimized number crunching functions. Use Smalltalk for any higher level/GUI function (via DLLCC/FFI). We may have more than 1 hammer in your toolset!
With GPU connectivity things emerging, number crunching might even be an interesting area for Smalltalk.
Agreed. I'm just applying for PhD at EPUSP/LSI (Prof. Dr. Zuffo) to work in this area (with possible support from NVIDIA). I'm also battling to get corporate support to start a(nother?) squeak group in São Paulo. Oh... before... I don't mean any kind of fork (LOL)...
If you remember, earlier this year I told about creating something like pyCUDA as squeakCUDA but subject is far more interesting than creating an adaptation layer for CUDA or OpenCL...
Best,
Michael
Best regards,
Casimiro Barreto
There is a lot of accidental complexity when writing code in this language
?????
@#!@#!$#$#$#$@#$$!~#~!#!@$#@@#%@#%$@#$#!# !
fernando
On Jun 20, 2010, at 8:57 AM, Igor Stasenko wrote:
http://therighttool.hammerprinciple.com/items/smalltalk
I think this could be used as a good guide for us to make an effort to move some low-ranked items into a high ranked ones :)
-- Best regards, Igor Stasenko AKA sig.
On 20 June 2010 12:49, Fernando olivero oliverof@lu.unisi.ch wrote:
There is a lot of accidental complexity when writing code in this language ?????
This is one of a most low-ranked statment, which means:
(There is a lot of accidental complexity when writing code in this language) not.
@#!@#!$#$#$#$@#$$!~#~!#!@$#@@#%@#%$@#$#!# ! fernando
On Jun 20, 2010, at 8:57 AM, Igor Stasenko wrote:
http://therighttool.hammerprinciple.com/items/smalltalk
I think this could be used as a good guide for us to make an effort to move some low-ranked items into a high ranked ones :)
-- Best regards, Igor Stasenko AKA sig.
squeak-dev@lists.squeakfoundation.org