Hi Chris,
[Could you remember to send your replies to the magma mailing list]
I must confess that I have no solution yet - just early thoughts.
Problem --------- Let us construct a hypothetical collection with indexes on attributes #a, #b, #c.
Now let use query this collection using these clauses: a = 'A' b between 'Ba' and 'Bz' c = 'C'
We therefore have 4 possibilities using just the AND and OR operators:
i) a & b & c ii) a | b | c iii) (a ^ b) | c iv) a | (b ^ c)
Syntax -------- Let us add convenience methods to Symbol, e.g. #at: which answers a MaClause. Then using Smalltalks cascade operator, the MaCollectionReader can build up a tree of clauses with the appropriate prescedence:
i) a & b & c
collection reader where: (#a at: 'A'); and: (#b from: 'Ba' to 'Bz'); and: (#c at: 'C');
ii) a | b | c
collection reader where: (#a at: 'A'); or: (#b from: 'Ba' to 'Bz'); or: (#c at: 'C');
iii) (a ^ b) | c
collection reader where: ( (#a at: 'A') or: (#b from: 'Ba' to 'Bz') ); or: (#c at: 'C')
iv) a | (b ^ c)
collection reader where: (#a at: 'A') or: ( (#b from: 'Ba' to 'Bz') and: (#c at: 'C') )
Client Side MaCollection Reader Work ------------------------------------------- As with the existing infrastructure, the reader askes each clause to compute the minimum and maximum hash values.
A modified tree of clauses is then serialised and submitted to the magma server.
Server Side Request Work ------------------------------ Here is where I am still very vague, but the idea is that just as with the existing queries, each clause will be able to find the oids between the minimum and maximum hashes.
I have no idea on performance, but could each clause not modify a bitmap index appropraitely ?
The server will then have a 2^48 (?) bit bitmap with the oids satisfying the query. This bitmap will be very sparse and may compress nicely.
The server can return this bitmap and the first page of objects.
Paging -------- Subsequent requests from the client for pages of objects, will submit the bitmap with the request.
Comments welcome
Brent
Comments welcome
This is sounding good so far, I am encouraged. You have zoomed straight to where, for me, the core (not-easily-solved) challenge is..
Server Side Request Work
Here is where I am still very vague, but the idea is that just as with the existing queries, each clause will be able to find the oids between the minimum and maximum hashes.
I have no idea on performance, but could each clause not modify a bitmap index appropraitely ?
Absolutely, and I assume we can find a good implement to efficiently represent the intersection (Bitmap, BitArray, RunArray, LargePositiveInteger, whatever..).
But we still have to figure out how to *calculate* that Bitmap, BitArray, RunArray or LargePositiveInteger from the MaHashIndexes given the Clause-tree as input. As it stands, I see no way to avoid enumerating the entire range for each condition, which could take a long time..
If we can figure out this part of the puzzle, I think the rest may be just implementation details..
Paging
Subsequent requests from the client for pages of objects, will submit the bitmap with the request.
Ok, this is much faster than actually creating the (temporary) intersected MaHashIndex, but will require some close scrutiny of the paging code to work with the Bitmap.
I have no idea on performance, but could each clause not modify a bitmap index appropraitely ?
I have a ~ half-million element MaHashIndex. Tests if the range is 10000:
| n | n _ 0. Time millisecondsToRun: [ self entriesFromIndex: 1 do: [ : x :y : z ] until: [ : a : b : c | (n_n+1) > 10000 ] ]
"9841"
If the range is 100000:
| n | n _ 0. Time millisecondsToRun: [ self entriesFromIndex: 1 do: [ : x :y : z ] until: [ : a : b : c | (n_n+1) > 100000 ] ]
"97408"
So it looks like about 1000 per second, too slow. :(
Is there any way to avoid having to enumerate the entire ranges?
magma@lists.squeakfoundation.org