Hi Chris,
Hey man, I'm sorry to hear of your injury, I sincerely hope you are ok! Given the brilliance of your post it doesn't sound like there's any permanent damage thank goodness.
I am fine, a bit tender but quite fine, thanks.
I do not yet understand your solution but my cursory read it sounds real good.. I need to read it again and dedicate some cycles to understand it better..
In essense I want to allow the Magma server to estimate whether it can create the bitmap and sort order within a given time and signal an exception if it cannot. In this way the client can determine how long is it preperared to wait. I have spent many a while waiting for Oracle to finish a sorted query, so an upperbound is a good enough compromise for me.
The sorting solution is also quite simple. Concider the people collection with indices on familyName and age.
Currently we will have five indices on this collection: one for the elements, 2 normal indices and 2 census indices.
The main index will hold the oids:
0001 -> Brent Pinkney (age = 33) 0002 -> Tom Pinkney (age = 63) 0003 -> Claire Pinkney (age = 31) 0004 -> Man Kanam (age = 29) NB. (familyName = 'Man') 0005 -> Nelson Madiba Mandela (age = 86)
Now concider the same objects in the familyName index: they will be stored in sorted order of familyName, viz:
0004, 0005, 0001, 0002, 0003 (ie, Man, Mandela, Brent Pinkney, Tom Pinkney, Claire Pinkney)
The age index will also be stored in sorted order, viz
0004, 0003, 0001, 0002, 0005 (ie, Man, Claire Pinkney, Brent Tom Pinkney, Tom Pinkney, Mandela)
So if we maintain a new sorting-index where each entry maps and oid to (in this exampl) two integers, being the position of that oid in the index then we have
0001 -> 3, 3 0002 -> 4, 4 0003 -> 5, 2 0004 -> 1, 1 0005 -> 2, 5
So if we have the following query:
query := people where: [ :p | (p familyName from: 'Adam' to: 'Zorro') & (p age between: 32 and: 90) ] orderBy: [ p age ]. query pageSize: 2.
[ q prepareWithin: 10 seconds ] on: MaTimeout do: [....]. q do: [ :p | .... ]
To satisfy this query we have to a) build bitmap and b) walk the bitmap in the correct order. But from #pageSize we know we only have to find the smallest and second smallest age values. We can use the sort-index above to help.
Now, (p familyName from: 'Oscar' to: 'Zorro') matches oids: 0001, 0002, 0003, 0004, 0005. But (p age between: 32 and: 80) only matches oids 0001, 0002, 0005.
We can estimate the time it will take to built the bitmap as it bound by the number of oids in the tightest cluase (age) and the number of sort cluases (one).
So we decide to build up the bimap index using the age clause as it is 'tighter'. For each oid we check the census index for the #familyName index to see whether the oid survives and set each oid's bit in the index accordingly. If the oid survives, we find its sort order in the age index and maintain the smallest two. At the end of this process the bitmap is set and the sort collection is
#( 0001->3 0002->4 )
We store this array or sort assacations in the bitmap. Obviously if we fail to build this bitmap and sort order withing the 10 seconds mandated by the client we return an exception. However if all goes well, we return immediately with a segment containing 0001 and 0002 being 'Brent Pinkney' and 'Tom Pinkney'. We also shuttle our posh bitmap back to the client to keep the Magma server simple.
Now, if the client does need the second (or subsequent pages) we know that the next oid must occur after entry 4 in the #age index. So we read each oid from the 5th and check whether it is in the bitmap. We continue reading ad checking oids until #pageSize oids have been found.
This is the best solution I can think of. It is also not too scary to implement (I think).
Please let me know.
Brent
Everything sounds fantastic except I have one big question.
So if we maintain a new sorting-index..
The positions in the other indices are not static, right?
If an object is added to a collection and it is somewhere in the middle of an indices sort order, then we will have to increment the posiitons of all the oids 'after' this oid in the sort index.
For very big collections this will be slow.
I have not thought this through so I did not want to mention it in my previous mail, but if we have a Magma collection of n elements where n is very big but sill < 20 million. ie. n is much less than ( (2 raisedTo: 48) - ), then could we not use a multiple of the oids real position, eg 4, 54, 1008692 as the position in the index?
ie multiple each oids position by 10 million:
4 -> 40000000 54 -> 540000000 1008692 -> 10086920000000
So to return to the example in my previous mail:
0001 -> 3, 3 0002 -> 4, 4 0003 -> 5, 2 0004 -> 1, 1 0005 -> 2, 5
is now
0001 -> 30000000, 30000000 0002 -> 40000000, 40000000 0003 -> 50000000, 20000000 0004 -> 10000000, 10000000 0005 -> 20000000, 50000000
If a new object with oid 0006 is added which happens to be in position 6 and position 3 respectively then we can map this to an entry 0006 -> 60000000, 30000000. This is becuase position 6 is our new highest position for the forst index so its just gets multipled by the ordinal. However since the oid is inserting itself in position 30000000 (= 3) in the second index, the oids currently at position 330000000 must be moved to position 30000001.
The sort index is now:
0001 -> 30000000, 30000001 0002 -> 40000000, 40000000 0003 -> 50000000, 20000000 0004 -> 10000000, 10000000 0005 -> 20000000, 50000000 0006 -> 60000000, 30000000
The result is that only one other sort entry is affected by the insertion.
Obviously, when the bitmap is enumerated these new contrived positions (e.g. 20000000) must be inspected module our multiplier 10000000 to determine the real position.
I have not have time to really inspect this algorithm, so it may be a dog. However, I hope such a spacing algorithm will allow Magma to keep inserting new objects quickly.
Regards
Brent
magma@lists.squeakfoundation.org