Finding and indexing 'similar' string

David Faught dave_faught at yahoo.com
Tue Aug 26 17:32:00 UTC 2003


Julian Fitzell wrote:
>Jim Menard wrote:
>> It works for any words because it is based on how they sound. I
>> have read about one problem with the algorithm, though: you need
>> different sets of characters and weightings for different
>> languages. For example, I think you would want "j" and "h" to map
>> to the same sound in Mexican Spanish. (Forgive me if that's a bad
>> example. The only Spanish I've ever learned was "May I have another
>> beer, please?" and "Where is the bathroom?")

>The other problem with it, as I recall, is that you the first letter 
>needs to be the same.  So a name/word that starts with 'ph' won't
>ever match a word that starts with 'f', for example, even if they
>sound the same.  Other than that, though, it works great: we used it
>for a sales system and it allowed users to stop asking people to
>spell their names over the phone.  I've tried typing in every
>convoluted spelling of my name I can think of and it always finds >me
:)

There is also the metaphone algorithm, which may or may not have these
same problems:
http://aspell.sourceforge.net/metaphone/


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com



More information about the Squeak-dev mailing list