Searching magma string indices

Chris Muller chris at funkyobjects.org
Tue Mar 6 03:59:29 UTC 2007


Hi Brent,

> > Now, the expression '%foo%bar%' goes beyond the first requirement
> of just searching on a prefix and/or a suffix..  For this a different
> type of index would be needed that adds every possible value for
> "Fredfoon Tobart", the following values:
> >
> >     fredfoon tobart
> >     redfoon tobart
> >     edfoon tobart
> >     dfoon tobart
> >     foon tobart
> >     oon tobart
> >     on tobart
> >     n tobart
> >      tobart
> >     tobart
> >     obart
> >     bart
> >     art
> >     rt
> >     t
> >
> > Your '%foo%bar%' expression would then need to translate to
> >
> >     (familyName from: 'foo' to: 'foo' maAlphabeticalNext)
> >     & (familyName from: 'bar' to: 'bar' maAlphabeticalNext)
> >
> > to find the object(s) that had "Fredfoon Tobart".
> 
> I must admit I am not following you here. Is this one of those Magma
> keyword indices which I have never managed to grok ?

Yes.  As you know, even though many indexes only map one value per
indexed attribute, any index can just as easily map multiple values per
object, per indexed attribute.

Have a look at MaClause>>#shouldInclude:using:, you can very simply how
it treats multiple values; if any one of the (multiple) indexed values
for a particular object are "in range", then the entire object
qualifies for that clause.

It's more generic than a "keyword" index, but a keyword index is an
easy way to grok it, because you know when you search for a keyword,
any of the keywords assigned to an object will cause that object to
match.

So in the above example, searching the first clause 

> >    (familyName from: 'foo' to: 'foo' maAlphabeticalNext)

is qualified on the 5th element

> >    foon tobart

and the second clause

> >     & (familyName from: 'bar' to: 'bar' maAlphabeticalNext)

is qualifie by the 12th element:

> >     bart

therefore the entire expression is satisfied and the object qualifies.

> Also, whilst you are hunting elephants, SQL has both % and ?
> wildcards: % is any sequence of characters inlcuding the empty string
> and ? is precicely one character.
> So foo??bar would match fooABbar but not fooCDEbar and not fooFbar.
> 
> Any chance you could bend Magma's indices into managing expressions
> with a fixed number of ?s (e.g. ?foo????bar??baz???)

Yes, there are several ways to do this but here's one.  A new index
type that simply indexes a unique value for each character at a
particular position.  Returning to the previous example, an object
whose #familyName attribute is "fredfoon tobart", this index type would
to index the object at the following values:

  (256 * 1) + $f asciiValue
  (256 * 2) + $r asciiValue
  (256 * 3) + $e asciiValue
  (256 * 4) + $d asciiValue
  (256 * 5) + $f asciiValue
  (256 * 6) + $o asciiValue
  (256 * 7) + $o asciiValue
  (256 * 8) + $n asciiValue
  (256 * 9) + Character space asciiValue
  (256 * 10) + $t asciiValue
  (256 * 11) + $o asciiValue
  (256 * 12) + $b asciiValue
  (256 * 13) + $a asciiValue
  (256 * 14) + $r asciiValue
  (256 * 15) + $t asciiValue

Then, whenever a single-character-matching string is specified, every
character other than the question-marks becomes a conjunction in the
query.  If the user is searching:

  where: [ : p | p familyName like: '????foon toba??' ]

then it would have to be parsed into the following regular clause:

  (familyName = ((256*5) + $f asciiValue)
  & (familyName = ((256*6) + $o asciiValue)
  & (familyName = ((256*7) + $o asciiValue)
  & (familyName = ((256*8) + Character space asciiValue)
  & (familyName = ((256*9) + $t asciiValue)
  & (familyName = ((256*10) + $o asciiValue)
  & (familyName = ((256*11) + $b asciiValue)
  & (familyName = ((256*12) + $a asciiValue)

Notice the question marks are the ones on which we are not qualifying,
so any value can be placed in positions 1-4 and 13-14.  

Does this all make sense?

> It is to darn hot here to think.

Yeah, you're probably just already missing skiing down fresh powder.. 
:)



More information about the Magma mailing list