Searching magma string indices

Chris Muller asqueaker at gmail.com
Thu Mar 29 03:56:10 UTC 2007


Hey!

> 3. Complex Wildcards
> ------------------------
> Expressions of the sort: where: [ :p | p familyName match:  '*foo*bar*' ] can be supported by adding hashes for each proper substring of the entry.
> e.g.
>         To hash 'foo to bart' we would add hashes for:
>                 'foo to bart'
>                 'oo to bart'
>                 'o to bart'
>                 ' to bart'
>                 'to bart'
>                 'o bart'
>                 ' bart'
>                 'bart'
>                 'art'
>                 'rt'
>                 't'
>
> The query '*foo*bar*' would be tranformed into the intersection of:
>
>         (familyName from: 'foo' to: 'foo' maAlphabeticalNext)
>         & (familyName from: 'bar' to: 'bar' maAlphabeticalNext)

I assume you want the above 'foo to bart' not be found with "*bar",
because it doesn't END WITH "bar", it ends with "bart".  BUT,
searching for suffixes with index-type #3 could translate the wildcard
slighlty differently, instead of the translation for "*bar*",

  (familyName from: 'bar' to: 'bar' maAlphabeticalNext)

the translation for "*bar" would have to use an equals:

         (familyName equals: 'bar')

So we may not need solution #2 at all..?

> 4. Single Character Wildcards
> ---------------------------------
> Expressions of the sort: where: [ :p | p familyName match:  'foo#bar' ] can be supported by ...
>
>         (familyName = ((256*1) + $f asciiValue)
>                 & (familyName = ((256*2) + $o asciiValue)
>                         & (familyName = ((256*3) + $o asciiValue)
>                                 & (familyName = ((256*5) + $b asciiValue)
>                                         & (familyName = ((256*5) + $a asciiValue)
>                                                 & (familyName = ((256*5) + $r asciiValue)

Yeah, but you pointed out last time that you needed

> to support expressions like '%foo%bar?b??t?'

so this index-type #4 seems insufficient to meet that requirement
because the % wildcards matching multiple characters will throw off
the entire positioning.  Frankly, that sort of a query really
stretches my mental capacity, I don't know whether I'd attempt a query
like that..  Still, a different approach to single-character wildcard
matching might help.  By further enhancing the "wildcard translation"
of solution #3 we might be able to better support single-character
wildcard matching.  Consider we're looking with 'foo t? b?rt':

(thinking out loud here)

   - scan through the "like" string, get all the pieces between
wildcards:  #('foo t' ' b' 'rt')

   - for all but the right-most, use the standard wildcard range:

        & (familyName between: 'foo t' and: 'foo t' alphabeticalNext)

   - but for the right-most one, use equals:

       & (familyName equals: 'rt')

Sigh..  This obviously isn't water tight..  It doesn't account for the
*order* in which the elements appear, only that they all appear
somewhere.  Still, that index-type #3 really offers a lot of bang for
the buck.  I'm sure we will indeed end up needing more than one
underlying index-type, I'm just not sure what..   and I have no
problem double-dispatching to the index, back to the collection to add
itself (themselves).

I'd be surprised and disappointed if it is proven "impossible" given
the considerable flexibility of Magma's indexing.  We probably just
need a stroke of creativity or genius to figure it out.

> ISSUES
> ---------
> 1) Is there no simpler solution I am missing.

If there is I'm not seeing it right now..

> 2) If not, Magma would have to change #addIndex: to double dispatch back to the MaCollectionIndex so it can add itself (or three indexes) to the collection.

No problem.  But lets first figure out exactly what indexes we need..

> 3) Are there issues with the -keys indexes that Magma addes by itself for each index ?

We'll have to make sure each underlying "sub-index" is by a different
"private" attribute.  It might be tricky, hopefully not.

> 4) Is this stuff you would concider adding to Magma or is it for Lava only ?

If its general, absolutely.

 - Chris


More information about the Magma mailing list