[Newbies] Best data structure for a multidimensional database?

Michael van der Gulik mikevdg at gmail.com
Tue Oct 7 21:45:18 UTC 2008


On Wed, Oct 8, 2008 at 4:38 AM, stan shepherd <squeak414 at free.fr> wrote:

>
> Hi, I'm looking at building a small proof of concept of a multidimensional
> modelling tool in Squeak. Commercial products are things like Cognos, and
> the old Express that was assimilated by Oracle.
>
> A typical 'cube' will be 'dimensioned' by product, region, time. Each
> dimension has one or more roll-ups, e.g.
>
>                                             All Ice Creams
>
>                                        /                      \
>                              Cornetto                             Tub
>
>                        /             |                 \
>               C. Raspberry   C. Vanilla      C. Chocolate
>
>
> Then sales data would be entered for the lowest level, then rolled up over
> the product hierarchy, the region hierarchy, the time hierarchy.
>
> >From there, you can ask qusetions like "what's the year on year change in
> sales of Cornetto for Western Europe".
>
> Two data structures spring to mind:
>
> 1) Use nested dictionaries for the dimensions, so that from the sales cube
> we select the dictionary entry for Cornetto, then from there the entry for
> Western Europe, then the two entries for this year to date and last year to
> date, being actual numbers.
> 2) Give each dimension element a numerical index, eg Cornetto is product no
> 451 in the product dimension. Sales then becomes a single dictionary where
> we calculate the index as product number + (region number * number of
> products) + (period number * number of products * number of regions)
>
> no 2) sounds faster, but no 1) sounds Squeakier. Does anyone have any
> advice
> as to how best to do this?
>
> Options 3) etc also welcome.
>
> I daresay the correct answer is to do both and see which works best, but I
> suspect there are some obvious gotchas I'm not seeing.
>


I'd probably start by looking at how existing multidimensional databases
store their data structures, and then try to turn that into objects.
Unfortunately, I don't have any experience with this.

What I would do is have a huge unsorted god-collection which has
"DimensionalData" objects in it. This would be a place just to get the data
into the image in the first place. Each DimensionalData object would store a
list of dimension coordinates and then the actual data... somehow. So you'd
have an object that would contain: (Chocolate ice cream, Region 123, October
4 at 2pm, 4 ice creams sold). Each of these would contain a point in the
dimensional space.

Then I would start trying to find some way of creating "index"s (in the SQL
sense) over this raw unsorted data. You could then use these indexes to do
queries. Each type of query would need a particular type of index, so you'd
have a lot of fun trying to write reusable code for this.

If hierarchies are used quite a lot, then I'd probably try to make a "Tree"
class with a parent, children and iteration methods (cf: Collection et al).

Gulik.

-- 
http://people.squeakfoundation.org/person/mikevdg
http://gulik.pbwiki.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/beginners/attachments/20081008/532fab8d/attachment.htm


More information about the Beginners mailing list