Hi Chris,
Thanks for your suggestions: I think we are getting closer. However:
Also, whilst you are hunting elephants, SQL has both % and ? wildcards: % is any sequence of characters inlcuding the empty string and ? is precicely one character. So foo??bar would match fooABbar but not fooCDEbar and not fooFbar.
Any chance you could bend Magma's indices into managing expressions with a fixed number of ?s (e.g. ?foo????bar??baz???)
Yes, there are several ways to do this but here's one. A new index type that simply indexes a unique value for each character at a particular position. Returning to the previous example, an object whose #familyName attribute is "fredfoon tobart", this index type would to index the object at the following values:
(256 * 1) + $f asciiValue (256 * 2) + $r asciiValue (256 * 3) + $e asciiValue (256 * 4) + $d asciiValue (256 * 5) + $f asciiValue (256 * 6) + $o asciiValue (256 * 7) + $o asciiValue (256 * 8) + $n asciiValue (256 * 9) + Character space asciiValue (256 * 10) + $t asciiValue (256 * 11) + $o asciiValue (256 * 12) + $b asciiValue (256 * 13) + $a asciiValue (256 * 14) + $r asciiValue (256 * 15) + $t asciiValue
Then, whenever a single-character-matching string is specified, every character other than the question-marks becomes a conjunction in the query. If the user is searching:
where: [ : p | p familyName like: '????foon toba??' ]
then it would have to be parsed into the following regular clause:
(familyName = ((256*5) + $f asciiValue) & (familyName = ((256*6) + $o asciiValue) & (familyName = ((256*7) + $o asciiValue) & (familyName = ((256*8) + Character space asciiValue) & (familyName = ((256*9) + $t asciiValue) & (familyName = ((256*10) + $o asciiValue) & (familyName = ((256*11) + $b asciiValue) & (familyName = ((256*12) + $a asciiValue)
Notice the question marks are the ones on which we are not qualifying, so any value can be placed in positions 1-4 and 13-14.
Note: SQL actually uses '_' not '?' for a single wild character. Sorry: I continue to use '?' in this thread for consistency.
This all works except for expressions of the form: '???' i.e. where we are searching for all strings with exactly three characters. I have yet to think of how to include this boundary condition except that we could additionally index on the size of the strings. In this case the MaHashIndex would include a record with key = 3 and values = 'all string of length three'.
Finally, we also need to support expressions like '%foo%bar?b??t?'. I am hoping your earlier suggestions on searching for %'s can be merged into the same index as the ?s.
Any ideas ?
Brent