Q: Complex Magma Indexed queries

Mon Dec 19 16:01:09 UTC 2005

> Would it be possible to support complex queries on multiple indices ?

I have long been interested in how to solve this problem, but am not smart
enough to do it myself w/o help.  Thanks for provoking this discussion!

The way I envision this working from a user perspective is, the user "Reads" a
collection on one index (and thus obtains a Reader), then can "intersect:" that
Reader with another Reader to build yet a third Reader, which can be further
intersected with other Readers as necessary.

There are three parts to the problem; 1) calculating the intersection/union, 2)
creating a consistent Reader "view" of this intersection and 3) keeping the
"view" maintained as various items eligibility change.

Perhaps if (1) and (2) can be done quickly, (3) may not be necessary.

Let me expound my thoughts.

1) Calculating the intersection

At the MaHashIndex level, calculating an intersection is very easy to implement
using the "Bit Vectors" approach described at:

  http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK3/NODE133.HTM

I was barely able to understand this, but the idea I got from it was each oid
could represent a bit-position of a LargePositiveInteger.  As we enumerate each
index, we simply setBit: oid of our number.  Yes, it will be a VERY large
number but Squeak seems to handle it just fine (try it!).  So we do this for
each index and then simply bitAnd: the numbers together to get a single
LargePositiveInteger which describes the intersection, nice!

It is potentially problematic, though, if someone wants to intersect two very
huge ranges, because calculating the Integers requires enumerating the entire
ranges.

.. Another idea involves building a second mapping going the other way; i.e.,
instead of just hashValue->oid, also insert oid->hashValue into a counterpart
index..  I'll have to think more about this..

2) Creating a consistent Reader "view" of this intersection

The easiest way to tackle this is to use the existing Reader logic that's
already built-in; therefore we must have the "intersected MaHashIndex" (or
equivalent) as its back-end.  From my view, this is the best and only way to
leverage all the code that's there, which was painstakingly implemented to
correctness.

The challenge is how do we know the user is done with the temporary,
intersected MaHashIndex?  The only answer I can think of at the moment is to
require them to #recycle it or something (yuck).

3) Keeping it up to date
Don't even want to worry about this right now unless we can solve 1 & 2...

> Comments welcomed

Consider also that we have the ability to write a custom index that indexes on
multiple attributes simultaneously.  For example, for each person, you could
insert two hash values into one index, one for the familyName and one for the
dateOfBirth.

This does not solve the requirements, but I thought it worth mentioning to help
us think "outside the box"; since relational-style indexing has it so ingrained
in us that indexes must be uniform on one attribute..

More comments welcomed please..