Finding and indexing 'similar' string
Julian Fitzell
julian at beta4.com
Tue Aug 26 16:54:54 UTC 2003
Jim Menard wrote:
> Chris,
>
> On Tuesday, August 26, 2003, at 12:01 PM, Chris Muller wrote:
>
>> Jim Menard wrote:
>>
>>> How about using the Soundex algorithm? A quick Google search found this
>>> brief explanation <http://www.frontiernet.net/~rjacob/soundex.htm>
>>
>>
>> Ohhh! Thank you Jim! What a simple, well-explained method for a
>> sounds-like
>> index. This would be a great new index type for MagmaCollections..
>>
>> Do you know whether it works for other keywords? Or just Surnames? I
>> would
>> think it would, since some people's surname are regular words anyway..
>
>
> It works for any words because it is based on how they sound. I have
> read about one problem with the algorithm, though: you need different
> sets of characters and weightings for different languages. For example,
> I think you would want "j" and "h" to map to the same sound in Mexican
> Spanish. (Forgive me if that's a bad example. The only Spanish I've ever
> learned was "May I have another beer, please?" and "Where is the
> bathroom?")
>
> Jim
The other problem with it, as I recall, is that you the first letter
needs to be the same. So a name/word that starts with 'ph' won't ever
match a word that starts with 'f', for example, even if they sound the
same. Other than that, though, it works great: we used it for a sales
system and it allowed users to stop asking people to spell their names
over the phone. I've tried typing in every convoluted spelling of my
name I can think of and it always finds me :)
Julian
More information about the Squeak-dev
mailing list
|