conjuncting MaHashIndexes

Chris Muller chris at funkyobjects.org
Thu Apr 6 04:01:18 UTC 2006


Idea for finding intersections between MaHashIndexes based on multiply
anded conditions:

If, for every key->oid pair we insert into any MaHashIndex, we also
insert into a separate, additional MaHashIndex, oid->key.  I will refer
to this reverse-mapped index as the "census" index because it answers
where is each object.

So, with these census indexes we can quickly know, for any oid, all
keys it resides at.  We have a tree of conjuncted conditions being
requested from the user.  To (and) conjunct the ranges specified in the
tree of a number of MaHashIndexes we could follow these steps:

	1) First use #numberOfEntriesFrom:to: to discover which index will
have the most-restricted range to enumerate based on the anded
conditions of the tree.  Call the size of this range "sr".
	2) enumerate all the oids from lowKey to highKey of that index.
	3) enumerate each of the other census indexes specified by the
conditions.
	4) determine whether this eachOid should be included in the result. 
If any censusIndex indicates no, the oid is rejected (because we're
anding), move to the next oid.  If all censusIndexes have that oid in
the key-range, the oid is included.
		The implementation should be sure to enumerate the Census indexes a
max of once.
		For eachCensusEntry:
			get the condition for this index#
			if the key is between: condition lowKey and: condition highKey
				mark condition satisfied
				return true if all conditions satisfied
		return false

		For example, for a particular oid with these census values and this
tree of conditions being anded together
				Census					ANDed Conditions
				index1  { 100. 319 }		index1 -> keys 300 to: 500.
				index2  { 700 }			index2 -> keys 700 to: 900.
		this oid would be accepted because all conditions were satisfied. 
Note it existed in two places in index1 (which would happen in a
keyword index) but only one of them needs to be satisfied.

So, in the worst case, the most number of steps we have to process is

	sr*number of conditions

This does not seem too bad!

 - Chris


More information about the Magma mailing list