Finding and indexing 'similar' string
Jim Menard
jimm at io.com
Tue Aug 26 16:10:59 UTC 2003
Chris,
On Tuesday, August 26, 2003, at 12:01 PM, Chris Muller wrote:
> Jim Menard wrote:
>
>> How about using the Soundex algorithm? A quick Google search found
>> this
>> brief explanation <http://www.frontiernet.net/~rjacob/soundex.htm>
>
> Ohhh! Thank you Jim! What a simple, well-explained method for a
> sounds-like
> index. This would be a great new index type for MagmaCollections..
>
> Do you know whether it works for other keywords? Or just Surnames? I
> would
> think it would, since some people's surname are regular words anyway..
It works for any words because it is based on how they sound. I have
read about one problem with the algorithm, though: you need different
sets of characters and weightings for different languages. For example,
I think you would want "j" and "h" to map to the same sound in Mexican
Spanish. (Forgive me if that's a bad example. The only Spanish I've
ever learned was "May I have another beer, please?" and "Where is the
bathroom?")
Jim
--
Jim Menard, jimm at io.com, http://www.io.com/~jimm/
"333: Eric the Half A Beast" -- Tim Allen in rec.humor.oracle.d
More information about the Squeak-dev
mailing list
|