Finding and indexing 'similar' string

Jim Menard jimm at io.com
Tue Aug 26 16:10:59 UTC 2003


Chris,

On Tuesday, August 26, 2003, at 12:01  PM, Chris Muller wrote:

> Jim Menard wrote:
>
>> How about using the Soundex algorithm? A quick Google search found 
>> this
>> brief explanation <http://www.frontiernet.net/~rjacob/soundex.htm>
>
> Ohhh!  Thank you Jim!  What a simple, well-explained method for a 
> sounds-like
> index.  This would be a great new index type for MagmaCollections..
>
> Do you know whether it works for other keywords?  Or just Surnames?  I 
> would
> think it would, since some people's surname are regular words anyway..

It works for any words because it is based on how they sound. I have 
read about one problem with the algorithm, though: you need different 
sets of characters and weightings for different languages. For example, 
I think you would want "j" and "h" to map to the same sound in Mexican 
Spanish. (Forgive me if that's a bad example. The only Spanish I've 
ever learned was "May I have another beer, please?" and "Where is the 
bathroom?")

Jim
-- 
Jim Menard, jimm at io.com, http://www.io.com/~jimm/
"333: Eric the Half A Beast" -- Tim Allen in rec.humor.oracle.d



More information about the Squeak-dev mailing list