Finding and indexing 'similar' string

Dean_Swan at Mitel.COM Dean_Swan at Mitel.COM
Tue Aug 26 18:00:28 UTC 2003


Chris,

        There is also an algorithm called 'Metaphone' that was originally
published in Computer Language in 1990.  It does a somewhat better job
of matching similar sounding words (at least in English).  The principal
weakness of soundex is that it always uses the first letter of the word,
which can often be spelled differently.

        You might also try searches on 'agrep' ("approximate grep") and
'string similarity' and 'approximate string matching' or
'approximate pattern matching' for other references.


Here are a few fairly good references:

        http://www.bitmechanic.com/mail-archives/mysql/Jan1998/0666.html
        http://aspell.net/metaphone/metaphone-kuhn.txt
        http://www.dcc.ufmg.br/~ghuiban/paa/tp3/node18.html


                                        -Dean







Chris Muller <afunkyobject at yahoo.com>
Sent by: squeak-dev-bounces at lists.squeakfoundation.org
08/26/03 12:01 PM
Please respond to chris; Please respond to The general-purpose Squeak 
developers list 

 
        To:     Squeak List <squeak-dev at lists.squeakfoundation.org>
        cc: 
        Subject:        Re: Finding and indexing 'similar' string



Jim Menard wrote:

> How about using the Soundex algorithm? A quick Google search found this 
> brief explanation <http://www.frontiernet.net/~rjacob/soundex.htm>

Ohhh!  Thank you Jim!  What a simple, well-explained method for a 
sounds-like
index.  This would be a great new index type for MagmaCollections..

Do you know whether it works for other keywords?  Or just Surnames?  I 
would
think it would, since some people's surname are regular words anyway..

 - Chris

__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/squeak-dev/attachments/20030826/48c59397/attachment.htm


More information about the Squeak-dev mailing list