Multiple indices on MagmaCollections?: cmm's partial solution

Brent Pinkney brent.pinkney at aircom.co.za
Fri May 19 16:08:17 UTC 2006


Hi Goran,

This ws Chris's message to me (I hope Chris does not mind) on how to try out his 
first pass at a query enginer which uses the census indices to narrow the list of possible oids.

This solution is prone to the limitations I described elsewhere in this thread.

----------  Forwarded Message  ----------

Subject: multiple indices - first optimized querying !!
Date: Saturday 15 April 2006 00:06
From: Chris Muller <chris at funkyobjects.org>
To: Brent Pinkney <brent.pinkney at aircom.co.za>

I've finally caught up to you Brent.  I am full into this iteration, I
have therefore merged all of the Magma-research package into
appropriate real Magma packages.

Please load the following from SqueakSource on top of a Magma 1.0 image:

	Ma base additions-cmm.63
	Ma special collections-cmm.63
	Magma client-cmm.180
	Magma server-cmm.156

Then you (should) be able to execute the same where: expressions except
it will now use optimized algorithm that calculates the minimum
"QueryTrunk" that must be enumerated.

Note I have only performed very minimal testing; two or three queries that seem to work.

To support optimized queries, we now must maintain TWO MaHashIndexes
for each index instead of just one.  Regrettably, this will slow down
the insert rate, one of the slowest things Magma does anyway (despite
our hard work to optimize it last year).  Many of the changes in
MagmaCollectionManager are to account for the new oid->key index
mapping.  I nevertheless, think the reward of supporting complex
queries will make it worth it.

Please note that, so far, this implementation dynamically calculates a
query result, it does NOT require ANY bitmap as yet, but at the expense
of some side-effects:

  - sorting is not provided (yet)
  - the paging is cannot be perfect, there will be some overlap between
the pages (note this could happen anyway because the result set can
change while the user is paging).  This happens because we only start
at a specified page/startIndex of the *trunk*, not the actual query
result.    

If you happen to see how it works in the debugger, you will doubless
understand new possibilities for our "bitmap" solution.  I think we
could possibly get away with simply recording results dispersion for
the pages the user has visited.

> One interesting thing to look it is 'partial ordering' as used by 
> Google. They 
> find the 'best' results and just order the first two pages. Just a 
> thought though.

Well, I think this is a great thought, because that is exactly what we
can do right now; we cannot provide sorted results like a relational
database where page one contains the lowest key in the entire result
set.  If we can sort just the page that we have the *client* could do
that work.  If we can figure out how to do it relational style I'm all
for it, but do you think we can maintain this dynamism we have now?

Enjoy!
 Chris

-------------------------------------------------------

Brent


More information about the Magma mailing list