[Newbies] Best data structure for a multidimensional database?

Michael van der Gulik mikevdg at gmail.com
Tue Oct 7 22:10:52 UTC 2008


On Wed, Oct 8, 2008 at 10:45 AM, Michael van der Gulik <mikevdg at gmail.com>wrote:

>
>
> On Wed, Oct 8, 2008 at 4:38 AM, stan shepherd <squeak414 at free.fr> wrote:
>
>>
>> Hi, I'm looking at building a small proof of concept of a multidimensional
>> modelling tool in Squeak. Commercial products are things like Cognos, and
>> the old Express that was assimilated by Oracle.
>>
>> A typical 'cube' will be 'dimensioned' by product, region, time. Each
>> dimension has one or more roll-ups, e.g.
>>
>>                                             All Ice Creams
>>
>>                                        /                      \
>>                              Cornetto                             Tub
>>
>>                        /             |                 \
>>               C. Raspberry   C. Vanilla      C. Chocolate
>>
>>
>> Then sales data would be entered for the lowest level, then rolled up over
>> the product hierarchy, the region hierarchy, the time hierarchy.
>>
>> >From there, you can ask qusetions like "what's the year on year change in
>> sales of Cornetto for Western Europe".
>>
>> Two data structures spring to mind:
>>
>> 1) Use nested dictionaries for the dimensions, so that from the sales cube
>> we select the dictionary entry for Cornetto, then from there the entry for
>> Western Europe, then the two entries for this year to date and last year
>> to
>> date, being actual numbers.
>> 2) Give each dimension element a numerical index, eg Cornetto is product
>> no
>> 451 in the product dimension. Sales then becomes a single dictionary where
>> we calculate the index as product number + (region number * number of
>> products) + (period number * number of products * number of regions)
>>
>> no 2) sounds faster, but no 1) sounds Squeakier. Does anyone have any
>> advice
>> as to how best to do this?
>>
>> Options 3) etc also welcome.
>>
>> I daresay the correct answer is to do both and see which works best, but I
>> suspect there are some obvious gotchas I'm not seeing.
>>
>
>
> I'd probably start by looking at how existing multidimensional databases
> store their data structures, and then try to turn that into objects.
> Unfortunately, I don't have any experience with this.
>
> What I would do is have a huge unsorted god-collection which has
> "DimensionalData" objects in it. This would be a place just to get the data
> into the image in the first place. Each DimensionalData object would store a
> list of dimension coordinates and then the actual data... somehow. So you'd
> have an object that would contain: (Chocolate ice cream, Region 123, October
> 4 at 2pm, 4 ice creams sold). Each of these would contain a point in the
> dimensional space.
>
> Then I would start trying to find some way of creating "index"s (in the SQL
> sense) over this raw unsorted data. You could then use these indexes to do
> queries. Each type of query would need a particular type of index, so you'd
> have a lot of fun trying to write reusable code for this.
>
> If hierarchies are used quite a lot, then I'd probably try to make a "Tree"
> class with a parent, children and iteration methods (cf: Collection et al).
>


On further thought, provided that the first index you make contains all of
the data and you can iterate over it, you don't need the "god collection".

What would be nice is if an API similar to the Collection API could be used
to query the db. For example:

db select: [ :each |
    (each time > n)
    and: (each time < m)
    and: (each region in: p) ]

Where "each region in: p" means "each's region is p or a sub-region of p".

The "each" object could be a special object that carefully watches the
messages it receives. These messages are the clues to which indexes would be
needed (which could be either created new or recycled).

The result of this query could be a "SubIndex" object, which contains start-
and end- references into a larger index. This "SubIndex" object would also
understand selectors such as >>do:, >>count:, etc to iterate over its
entries.

Gulik.

-- 
http://people.squeakfoundation.org/person/mikevdg
http://gulik.pbwiki.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/beginners/attachments/20081008/550ac998/attachment.htm


More information about the Beginners mailing list