Would it be possible to support complex queries on multiple indices ?
I have long been interested in how to solve this problem, but am not smart enough to do it myself w/o help. Thanks for provoking this discussion!
The way I envision this working from a user perspective is, the user "Reads" a collection on one index (and thus obtains a Reader), then can "intersect:" that Reader with another Reader to build yet a third Reader, which can be further intersected with other Readers as necessary.
There are three parts to the problem; 1) calculating the intersection/union, 2) creating a consistent Reader "view" of this intersection and 3) keeping the "view" maintained as various items eligibility change.
Perhaps if (1) and (2) can be done quickly, (3) may not be necessary.
Let me expound my thoughts.
1) Calculating the intersection
At the MaHashIndex level, calculating an intersection is very easy to implement using the "Bit Vectors" approach described at:
http://www2.toki.or.id/book/AlgDesignManual/BOOK/BOOK3/NODE133.HTM
I was barely able to understand this, but the idea I got from it was each oid could represent a bit-position of a LargePositiveInteger. As we enumerate each index, we simply setBit: oid of our number. Yes, it will be a VERY large number but Squeak seems to handle it just fine (try it!). So we do this for each index and then simply bitAnd: the numbers together to get a single LargePositiveInteger which describes the intersection, nice!
It is potentially problematic, though, if someone wants to intersect two very huge ranges, because calculating the Integers requires enumerating the entire ranges.
.. Another idea involves building a second mapping going the other way; i.e., instead of just hashValue->oid, also insert oid->hashValue into a counterpart index.. I'll have to think more about this..
2) Creating a consistent Reader "view" of this intersection
The easiest way to tackle this is to use the existing Reader logic that's already built-in; therefore we must have the "intersected MaHashIndex" (or equivalent) as its back-end. From my view, this is the best and only way to leverage all the code that's there, which was painstakingly implemented to correctness.
The challenge is how do we know the user is done with the temporary, intersected MaHashIndex? The only answer I can think of at the moment is to require them to #recycle it or something (yuck).
3) Keeping it up to date Don't even want to worry about this right now unless we can solve 1 & 2...
Comments welcomed
Consider also that we have the ability to write a custom index that indexes on multiple attributes simultaneously. For example, for each person, you could insert two hash values into one index, one for the familyName and one for the dateOfBirth.
This does not solve the requirements, but I thought it worth mentioning to help us think "outside the box"; since relational-style indexing has it so ingrained in us that indexes must be uniform on one attribute..
More comments welcomed please..