Hey!
- Complex Wildcards
Expressions of the sort: where: [ :p | p familyName match: '*foo*bar*' ] can be supported by adding hashes for each proper substring of the entry. e.g. To hash 'foo to bart' we would add hashes for: 'foo to bart' 'oo to bart' 'o to bart' ' to bart' 'to bart' 'o bart' ' bart' 'bart' 'art' 'rt' 't'
The query '*foo*bar*' would be tranformed into the intersection of:
(familyName from: 'foo' to: 'foo' maAlphabeticalNext) & (familyName from: 'bar' to: 'bar' maAlphabeticalNext)
I assume you want the above 'foo to bart' not be found with "*bar", because it doesn't END WITH "bar", it ends with "bart". BUT, searching for suffixes with index-type #3 could translate the wildcard slighlty differently, instead of the translation for "*bar*",
(familyName from: 'bar' to: 'bar' maAlphabeticalNext)
the translation for "*bar" would have to use an equals:
(familyName equals: 'bar')
So we may not need solution #2 at all..?
- Single Character Wildcards
Expressions of the sort: where: [ :p | p familyName match: 'foo#bar' ] can be supported by ...
(familyName = ((256*1) + $f asciiValue) & (familyName = ((256*2) + $o asciiValue) & (familyName = ((256*3) + $o asciiValue) & (familyName = ((256*5) + $b asciiValue) & (familyName = ((256*5) + $a asciiValue) & (familyName = ((256*5) + $r asciiValue)
Yeah, but you pointed out last time that you needed
to support expressions like '%foo%bar?b??t?'
so this index-type #4 seems insufficient to meet that requirement because the % wildcards matching multiple characters will throw off the entire positioning. Frankly, that sort of a query really stretches my mental capacity, I don't know whether I'd attempt a query like that.. Still, a different approach to single-character wildcard matching might help. By further enhancing the "wildcard translation" of solution #3 we might be able to better support single-character wildcard matching. Consider we're looking with 'foo t? b?rt':
(thinking out loud here)
- scan through the "like" string, get all the pieces between wildcards: #('foo t' ' b' 'rt')
- for all but the right-most, use the standard wildcard range:
& (familyName between: 'foo t' and: 'foo t' alphabeticalNext)
- but for the right-most one, use equals:
& (familyName equals: 'rt')
Sigh.. This obviously isn't water tight.. It doesn't account for the *order* in which the elements appear, only that they all appear somewhere. Still, that index-type #3 really offers a lot of bang for the buck. I'm sure we will indeed end up needing more than one underlying index-type, I'm just not sure what.. and I have no problem double-dispatching to the index, back to the collection to add itself (themselves).
I'd be surprised and disappointed if it is proven "impossible" given the considerable flexibility of Magma's indexing. We probably just need a stroke of creativity or genius to figure it out.
ISSUES
- Is there no simpler solution I am missing.
If there is I'm not seeing it right now..
- If not, Magma would have to change #addIndex: to double dispatch back to the MaCollectionIndex so it can add itself (or three indexes) to the collection.
No problem. But lets first figure out exactly what indexes we need..
- Are there issues with the -keys indexes that Magma addes by itself for each index ?
We'll have to make sure each underlying "sub-index" is by a different "private" attribute. It might be tricky, hopefully not.
- Is this stuff you would concider adding to Magma or is it for Lava only ?
If its general, absolutely.
- Chris