I may be doing this wrong so I'll share my approach - I'm finding it unusably slow though.
I'm working on a stock market analysis tool. There are about 1100 securities on the AMEX, NASDAQ, and NYSE. So I create a Dictionary in the root dict called securities. This is keyed by stock ticker symbol and the value is an object of type security which contains fields tickerSymbol, exchange, issueName, and historicalData.
The historicalData is an OrderedCollection of quotes going back as far as the 80's. A quote is a timestamp, hi, low, close, and trade volume. There are around 300 of these per year - going back as much as 25 years. Believe it or not, I can fetch these from yahoo as a csv and process them into objects in about 5-10 seconds per security. Saving this data structure into magma seems to take many times that. Something like 3 hours in Magma I think. If this were just the initial load, it would be tolerable. However fetching the last 5 days quotes and splicing them onto the tail of the historicalData collection takes as long as the initial load. So this approach isn't working for me.
I'm open to ideas on better ways to structure this. I'll also be adding some charts to keep in the database - the idea being the charts are mostly up to date and I only have to replot the last day's worth of data everytime it is fetched. When I say chart - I mean a data structure containing a 2D array of values - not a visual representation. I will always draw the visual form on the fly.
This chart will also potentially be a big block of data that will reference the historicalData points.
Ideas? I'm close to just going to image segments - one per security.
-Todd
Hi Todd,
beside what you said, have you made some profiling?
The test runner has this option of running a test profiled. Maybe you can make a test of one of those insertions and see exactly what is eating so much time.
If you confirm that is that ordered collection (I would suspect so) then use something that when changes do not have to be entirely serialized each time, like a MagmaCollection or a Btree.
I also think that having a Dictionary for those 1100 securities is not helping you. I bet that if you use a Btree instead of it things will get better. Also you can try a MagmaCollection.
Don't think in MagmaCollections just for large collections but also for efficient writes when they are changed. The oposite is most probably true with common collection, dicts, etc.
I hope you can improve that,
cheers,
Sebastian Sastre
-----Mensaje original----- De: magma-bounces@lists.squeakfoundation.org [mailto:magma-bounces@lists.squeakfoundation.org] En nombre de Todd Blanchard Enviado el: Lunes, 05 de Noviembre de 2007 05:39 Para: magma@lists.squeakfoundation.org Asunto: Slow performance
I may be doing this wrong so I'll share my approach - I'm finding it unusably slow though.
I'm working on a stock market analysis tool. There are about 1100 securities on the AMEX, NASDAQ, and NYSE. So I create a Dictionary in the root dict called securities. This is keyed by stock ticker symbol and the value is an object of type security which contains fields tickerSymbol, exchange, issueName, and historicalData.
The historicalData is an OrderedCollection of quotes going back as far as the 80's. A quote is a timestamp, hi, low, close, and trade volume. There are around 300 of these per year - going back as much as 25 years. Believe it or not, I can fetch these from yahoo as a csv and process them into objects in about 5-10 seconds per security. Saving this data structure into magma seems to take many times that. Something like 3 hours in Magma I think. If this were just the initial load, it would be tolerable. However fetching the last 5 days quotes and splicing them onto the tail of the historicalData collection takes as long as the initial load. So this approach isn't working for me.
I'm open to ideas on better ways to structure this. I'll also be adding some charts to keep in the database - the idea being the charts are mostly up to date and I only have to replot the last day's worth of data everytime it is fetched. When I say chart - I mean a data structure containing a 2D array of values - not a visual representation. I will always draw the visual form on the fly.
This chart will also potentially be a big block of data that will reference the historicalData points.
Ideas? I'm close to just going to image segments - one per security.
-Todd _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma
Hi Todd, 300*25 years is 7500 per security..? If you know you will always use that entire chunk of data at a time (i.e., for historical analysis) then you will definitely want to employ a ReadStrategy to make sure it grabs it in one gulp. See the documentation for details about ReadStrategy's.
Otherwise, if you will be accessing a few quotes here, a few quotes there (incidentally, Magma shines brightest with this type of application), you need to use a MagmaCollection as Sebastian remarked. MagmaCollections allow huge collections to be accessed a page at a time.
Magma will never be as fast as ImageSegments. ImageSegments will never be as flexible as Magma.
Regards, Chris
On Nov 5, 2007 2:38 AM, Todd Blanchard tblanchard@mac.com wrote:
I may be doing this wrong so I'll share my approach - I'm finding it unusably slow though.
I'm working on a stock market analysis tool. There are about 1100 securities on the AMEX, NASDAQ, and NYSE. So I create a Dictionary in the root dict called securities. This is keyed by stock ticker symbol and the value is an object of type security which contains fields tickerSymbol, exchange, issueName, and historicalData.
The historicalData is an OrderedCollection of quotes going back as far as the 80's. A quote is a timestamp, hi, low, close, and trade volume. There are around 300 of these per year - going back as much as 25 years. Believe it or not, I can fetch these from yahoo as a csv and process them into objects in about 5-10 seconds per security. Saving this data structure into magma seems to take many times that. Something like 3 hours in Magma I think. If this were just the initial load, it would be tolerable. However fetching the last 5 days quotes and splicing them onto the tail of the historicalData collection takes as long as the initial load. So this approach isn't working for me.
I'm open to ideas on better ways to structure this. I'll also be adding some charts to keep in the database - the idea being the charts are mostly up to date and I only have to replot the last day's worth of data everytime it is fetched. When I say chart - I mean a data structure containing a 2D array of values - not a visual representation. I will always draw the visual form on the fly.
This chart will also potentially be a big block of data that will reference the historicalData points.
Ideas? I'm close to just going to image segments - one per security.
-Todd _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma
There is another aspect to this type of application that I would like to leverage to improve Magma performance in the future. That is the notion that the Quote objects are never ever going to change again. Therefore Magma could be told to ignore them for change-detection and this could improve performance considerably.
There is already space in the object-buffer for the bit, I just need to think of a good way for the end-user program to tell Magma a particular object is "read-only".. Probably something like:
mySession beReadOnly: anObject
??
Something for me think about for a future version, if anyone has any suggestions please let me know.
On Nov 5, 2007 2:38 AM, Todd Blanchard tblanchard@mac.com wrote:
I may be doing this wrong so I'll share my approach - I'm finding it unusably slow though.
I'm working on a stock market analysis tool. There are about 1100 securities on the AMEX, NASDAQ, and NYSE. So I create a Dictionary in the root dict called securities. This is keyed by stock ticker symbol and the value is an object of type security which contains fields tickerSymbol, exchange, issueName, and historicalData.
The historicalData is an OrderedCollection of quotes going back as far as the 80's. A quote is a timestamp, hi, low, close, and trade volume. There are around 300 of these per year - going back as much as 25 years. Believe it or not, I can fetch these from yahoo as a csv and process them into objects in about 5-10 seconds per security. Saving this data structure into magma seems to take many times that. Something like 3 hours in Magma I think. If this were just the initial load, it would be tolerable. However fetching the last 5 days quotes and splicing them onto the tail of the historicalData collection takes as long as the initial load. So this approach isn't working for me.
I'm open to ideas on better ways to structure this. I'll also be adding some charts to keep in the database - the idea being the charts are mostly up to date and I only have to replot the last day's worth of data everytime it is fetched. When I say chart - I mean a data structure containing a 2D array of values - not a visual representation. I will always draw the visual form on the fly.
This chart will also potentially be a big block of data that will reference the historicalData points.
Ideas? I'm close to just going to image segments - one per security.
-Todd _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma
Thanks, I'm busy updating it to use MagmaSet as we speak (where set key is the timestamp of the data). The most common strategy with historical data is to ask for a range. IOW, gimme all the quotes from 1/1/1995 to today.
The other item that is maybe more problematic is storing the chart data - big chunk of points - but maybe its not so big as it is an OrderedCollection of OrderedCollections on the order of about 100 per side.
I assume sending commitAndBegin is a checkpoint save?
Also, after creating a MagmaSet that I stick in a BTree (which is the root) I'm getting this trying to read it back:
'Unable to realize Orphaned MagmaMutatingProxy'
If there's an error, an exception would be nicer than just returning an error string for the collection I'm trying to read.
On Nov 7, 2007, at 7:28 PM, Chris Muller wrote:
There is another aspect to this type of application that I would like to leverage to improve Magma performance in the future. That is the notion that the Quote objects are never ever going to change again. Therefore Magma could be told to ignore them for change-detection and this could improve performance considerably.
There is already space in the object-buffer for the bit, I just need to think of a good way for the end-user program to tell Magma a particular object is "read-only".. Probably something like:
mySession beReadOnly: anObject
??
Something for me think about for a future version, if anyone has any suggestions please let me know.
On Nov 5, 2007 2:38 AM, Todd Blanchard tblanchard@mac.com wrote:
I may be doing this wrong so I'll share my approach - I'm finding it unusably slow though.
I'm working on a stock market analysis tool. There are about 1100 securities on the AMEX, NASDAQ, and NYSE. So I create a Dictionary in the root dict called securities. This is keyed by stock ticker symbol and the value is an object of type security which contains fields tickerSymbol, exchange, issueName, and historicalData.
The historicalData is an OrderedCollection of quotes going back as far as the 80's. A quote is a timestamp, hi, low, close, and trade volume. There are around 300 of these per year - going back as much as 25 years. Believe it or not, I can fetch these from yahoo as a csv and process them into objects in about 5-10 seconds per security. Saving this data structure into magma seems to take many times that. Something like 3 hours in Magma I think. If this were just the initial load, it would be tolerable. However fetching the last 5 days quotes and splicing them onto the tail of the historicalData collection takes as long as the initial load. So this approach isn't working for me.
I'm open to ideas on better ways to structure this. I'll also be adding some charts to keep in the database - the idea being the charts are mostly up to date and I only have to replot the last day's worth of data everytime it is fetched. When I say chart - I mean a data structure containing a 2D array of values - not a visual representation. I will always draw the visual form on the fly.
This chart will also potentially be a big block of data that will reference the historicalData points.
Ideas? I'm close to just going to image segments - one per security.
-Todd _______________________________________________ Magma mailing list Magma@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/magma
OK, I'm totally frustrated. I can't thing one in or out of this database.
I want the root to act like a dictionary - so I set it up like this:
| set | set := (MagmaSet equivalenceAttributes: (Array with: #key)). set addIndex: (MaSearchStringIndex attribute: #key). MagmaRepositoryController create: 'magma' root: set.
I start the server session on a port and leave it. In the client I subclass MagmaSession to add schema specific methods. One thing I add is this:
MyDatabaseSession>>at: aSymbol ifAbsentPut: aBlock
| collection assoc | collection := (self root where:[:ea | ea key = aSymbol]). collection isEmpty ifTrue: [self root add: (assoc := aSymbol->(aBlock value))] ifFalse: [assoc := collection first]. ^assoc value.
This fails on a newly created database because MaCollectionReader isEmpty ends up calling sortIndex on a MaQueryTrunk which doesn't seem to understand it.
What gives?
Hi Todd, MagmaSets are only for very specific purposes, and only after they absolutely pay for themselves 10X over a regular MagmaCollection. You are not there yet. MagmaSets are twice as slow as a standard MagmaCollection when adding because they have to check #includes: before every add.
The MyDatabaseSession subclass seems like an attempt to make the session itself act as a "bucket" to stuff data into and then retrieve out of later. This is not necessary and probably even not very workable with Magma.
You should make an *object model* with your root object being a true custom domain object like a "InvestmentPortfolio", not a dumb collection. This InvestmentPortfolio may then reference several MagmaCollections which can serve as your "buckets"..
Regards, Chris
On Nov 8, 2007 1:36 AM, Todd Blanchard tblanchard@mac.com wrote:
OK, I'm totally frustrated. I can't thing one in or out of this database.
I want the root to act like a dictionary - so I set it up like this:
| set | set := (MagmaSet equivalenceAttributes: (Array with: #key)). set addIndex: (MaSearchStringIndex attribute: #key). MagmaRepositoryController create: 'magma' root: set.
I start the server session on a port and leave it. In the client I subclass MagmaSession to add schema specific methods. One thing I add is this:
MyDatabaseSession>>at: aSymbol ifAbsentPut: aBlock
| collection assoc | collection := (self root where:[:ea | ea key = aSymbol]). collection isEmpty ifTrue: [self root add: (assoc := aSymbol->(aBlock value))] ifFalse: [assoc := collection first]. ^assoc value.
This fails on a newly created database because MaCollectionReader isEmpty ends up calling sortIndex on a MaQueryTrunk which doesn't seem to understand it.
What gives?
You should make an *object model* with your root object being a true custom domain object like a "InvestmentPortfolio", not a dumb collection. This InvestmentPortfolio may then reference several MagmaCollections which can serve as your "buckets"..
Mkay - what if I want to add more buckets later? Will that "just work"?
On Nov 9, 2007, at 5:24 PM, Chris Muller wrote:
Hi Todd, MagmaSets are only for very specific purposes, and only after they absolutely pay for themselves 10X over a regular MagmaCollection. You are not there yet. MagmaSets are twice as slow as a standard MagmaCollection when adding because they have to check #includes: before every add.
The MyDatabaseSession subclass seems like an attempt to make the session itself act as a "bucket" to stuff data into and then retrieve out of later. This is not necessary and probably even not very workable with Magma.
You should make an *object model* with your root object being a true custom domain object like a "InvestmentPortfolio", not a dumb collection. This InvestmentPortfolio may then reference several MagmaCollections which can serve as your "buckets"..
Regards, Chris
On Nov 8, 2007 1:36 AM, Todd Blanchard tblanchard@mac.com wrote:
OK, I'm totally frustrated. I can't thing one in or out of this database.
I want the root to act like a dictionary - so I set it up like this:
| set | set := (MagmaSet equivalenceAttributes: (Array with: #key)). set addIndex: (MaSearchStringIndex attribute: #key). MagmaRepositoryController create: 'magma' root: set.
I start the server session on a port and leave it. In the client I subclass MagmaSession to add schema specific methods. One thing I add is this:
MyDatabaseSession>>at: aSymbol ifAbsentPut: aBlock
| collection assoc | collection := (self root where:[:ea | ea key = aSymbol]). collection isEmpty ifTrue: [self root add: (assoc := aSymbol->(aBlock
value))] ifFalse: [assoc := collection first]. ^assoc value.
This fails on a newly created database because MaCollectionReader isEmpty ends up calling sortIndex on a MaQueryTrunk which doesn't seem to understand it.
What gives?
Absolutely.
On Nov 9, 2007 8:28 PM, Todd Blanchard tblanchard@mac.com wrote:
You should make an *object model* with your root object being a true custom domain object like a "InvestmentPortfolio", not a dumb collection. This InvestmentPortfolio may then reference several MagmaCollections which can serve as your "buckets"..
Mkay - what if I want to add more buckets later? Will that "just work"?
On Nov 9, 2007, at 5:24 PM, Chris Muller wrote:
Hi Todd, MagmaSets are only for very specific purposes, and only after they absolutely pay for themselves 10X over a regular MagmaCollection. You are not there yet. MagmaSets are twice as slow as a standard MagmaCollection when adding because they have to check #includes: before every add.
The MyDatabaseSession subclass seems like an attempt to make the session itself act as a "bucket" to stuff data into and then retrieve out of later. This is not necessary and probably even not very workable with Magma.
You should make an *object model* with your root object being a true custom domain object like a "InvestmentPortfolio", not a dumb collection. This InvestmentPortfolio may then reference several MagmaCollections which can serve as your "buckets"..
Regards, Chris
On Nov 8, 2007 1:36 AM, Todd Blanchard tblanchard@mac.com wrote:
OK, I'm totally frustrated. I can't thing one in or out of this database.
I want the root to act like a dictionary - so I set it up like this:
| set | set := (MagmaSet equivalenceAttributes: (Array with: #key)). set addIndex: (MaSearchStringIndex attribute: #key). MagmaRepositoryController create: 'magma' root: set.
I start the server session on a port and leave it. In the client I subclass MagmaSession to add schema specific methods. One thing I add is this:
MyDatabaseSession>>at: aSymbol ifAbsentPut: aBlock
| collection assoc | collection := (self root where:[:ea | ea key = aSymbol]). collection isEmpty ifTrue: [self root add: (assoc := aSymbol->(aBlock
value))] ifFalse: [assoc := collection first]. ^assoc value.
This fails on a newly created database because MaCollectionReader isEmpty ends up calling sortIndex on a MaQueryTrunk which doesn't seem to understand it.
What gives?
magma@lists.squeakfoundation.org