E.g. the Soundex algorithm is optimized for English. Is there a more universal algorithm that would apply across large families of languages?
E.g. the Soundex algorithm is optimized for English. Is there a more universal algorithm
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
SOUNDEX is indeed English-oriented. Two others that take a wider variety of phonetic differences into account are: Double Metaphone and NYSIIS.
They produce encodings into a much larger possible space than SOUNDEX does. Double Metaphone, specifically, includes reductions with the express purpose of handling alternate pronunciations based on more languages than English.
I did a presentation on fuzzy string matching recently, the slides may be helpful.