<div dir="ltr"><br><br><div class="gmail_quote">On Wed, Oct 8, 2008 at 10:45 AM, Michael van der Gulik <span dir="ltr">&lt;<a href="mailto:mikevdg@gmail.com">mikevdg@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div dir="ltr"><div><div></div><div class="Wj3C7c"><br><br><div class="gmail_quote">On Wed, Oct 8, 2008 at 4:38 AM, stan shepherd <span dir="ltr">&lt;<a href="mailto:squeak414@free.fr" target="_blank">squeak414@free.fr</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

Hi, I&#39;m looking at building a small proof of concept of a multidimensional<br>

modelling tool in Squeak. Commercial products are things like Cognos, and<br>

the old Express that was assimilated by Oracle.<br>

<br>

A typical &#39;cube&#39; will be &#39;dimensioned&#39; by product, region, time. Each<br>

dimension has one or more roll-ups, e.g.<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; All Ice Creams<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Cornetto &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Tub<br>

<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \<br>

 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; C. Raspberry &nbsp; C. Vanilla &nbsp; &nbsp; &nbsp;C. Chocolate<br>

<br>

<br>

Then sales data would be entered for the lowest level, then rolled up over<br>

the product hierarchy, the region hierarchy, the time hierarchy.<br>

<br>

&gt;From there, you can ask qusetions like &quot;what&#39;s the year on year change in<br>

sales of Cornetto for Western Europe&quot;.<br>

<br>

Two data structures spring to mind:<br>

<br>

1) Use nested dictionaries for the dimensions, so that from the sales cube<br>

we select the dictionary entry for Cornetto, then from there the entry for<br>

Western Europe, then the two entries for this year to date and last year to<br>

date, being actual numbers.<br>

2) Give each dimension element a numerical index, eg Cornetto is product no<br>

451 in the product dimension. Sales then becomes a single dictionary where<br>

we calculate the index as product number + (region number * number of<br>

products) + (period number * number of products * number of regions)<br>

<br>

no 2) sounds faster, but no 1) sounds Squeakier. Does anyone have any advice<br>

as to how best to do this?<br>

<br>

Options 3) etc also welcome.<br>

<br>

I daresay the correct answer is to do both and see which works best, but I<br>

suspect there are some obvious gotchas I&#39;m not seeing.<br>

</blockquote></div><br><br></div></div>I&#39;d probably start by looking at how existing multidimensional databases store their data structures, and then try to turn that into objects. Unfortunately, I don&#39;t have any experience with this.<br>


<br>What I would do is have a huge unsorted god-collection which has &quot;DimensionalData&quot; objects in it. This would be a place just to get the data into the image in the first place. Each DimensionalData object would store a list of dimension coordinates and then the actual data... somehow. So you&#39;d have an object that would contain: (Chocolate ice cream, Region 123, October 4 at 2pm, 4 ice creams sold). Each of these would contain a point in the dimensional space.<br>


<br>Then I would start trying to find some way of creating &quot;index&quot;s (in the SQL sense) over this raw unsorted data. You could then use these indexes to do queries. Each type of query would need a particular type of index, so you&#39;d have a lot of fun trying to write reusable code for this. <br>


<br>If hierarchies are used quite a lot, then I&#39;d probably try to make a

&quot;Tree&quot; class with a parent, children and iteration methods (cf:

Collection et al).<br>

</div></blockquote></div><br><br>On further thought, provided that the first index you make contains all of the data and you can iterate over it, you don&#39;t need the &quot;god collection&quot;.<br><br>What would be nice is if an API similar to the Collection API could be used to query the db. For example:<br>

<br>db select: [ :each | <br>&nbsp;&nbsp;&nbsp; (each time &gt; n) <br>&nbsp;&nbsp;&nbsp; and: (each time &lt; m) <br>&nbsp;&nbsp;&nbsp; and: (each region in: p) ] <br><br>Where &quot;each region in: p&quot; means &quot;each&#39;s region is p or a sub-region of p&quot;.<br>

<br>The &quot;each&quot; object could be a special object that carefully watches the messages it receives. These messages are the clues to which indexes would be needed (which could be either created new or recycled).<br>

<br>The result of this query could be a &quot;SubIndex&quot; object, which contains start- and end- references into a larger index. This &quot;SubIndex&quot; object would also understand selectors such as &gt;&gt;do:, &gt;&gt;count:, etc to iterate over its entries.<br>

<br>Gulik.<br><br>-- <br><a href="http://people.squeakfoundation.org/person/mikevdg">http://people.squeakfoundation.org/person/mikevdg</a><br><a href="http://gulik.pbwiki.com/">http://gulik.pbwiki.com/</a><br>

</div>