Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8371347
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T14:08:25+00:00 2026-06-09T14:08:25+00:00

Algorithms for edit distance give a measure of the distance between two strings. Question:

  • 0

Algorithms for edit distance give a measure of the distance between two strings.

Question: which of these measures would be most relevant to detect two different persons names which are actually the same? (different because of a mispelling). The trick is that it should minimize false positives. Example:

Obaama
Obama
=> should probably be merged

Obama
Ibama
=> should not be merged.

This is just an oversimple example. Are their programmers and computer scientists who worked out this issue in more detail?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T14:08:26+00:00Added an answer on June 9, 2026 at 2:08 pm

    I can suggest an information-retrieval technique of doing so, but it requires a large collection of documents in order to work properly.

    Index your data, using the standard IR techniques. Lucene is a good open source library that can help you with it.

    Once you get a name (Obaama for example): retrieve the set of collections the word Obaama appears in. Let this set be D1.

    Now, for each word w in D11 search for Obaama AND w (using your IR system). Let the set be D2.

    The score |D2|/|D1| is an estimation how much w is connected to Obaama, and most likely will be close to 1 for w=Obama2.

    You can manually label a set of examples and find the value from which words will be expected.

    Using a standard lexicographical similarity technique you can chose to filter out words that are definetly not spelling mistakes (Like Barack).

    Another solution that is often used requires a query log – find a correlation between searched words, if obaama has correlation with obama in the query log – they are connected.


    1: You can improve performance by first doing the 2nd filter, and check only for candidates who are “similar enough” lexicographically.

    2: Usually a normalization is also used, because more frequent words are more likely to be in the same documents with any word, regardless of being related or not.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Are there examples of algorithms for determining the edit distance between 2 strings when
Which algorithms or data structures are used in auto-suggest features? It seems that edit-distance
So yes, I read about how edit distance can be used between strings to
In the book Algorithms for interviewers , there is such a question: How would
I want to find string similarity between two strings. en.wikipedia has examples of some
Is there an algorithm that lets you find the word-level edit distance between 2
I can find plenty formulas for finding the distance between two skew lines. I
I came across this variation of edit-distance problem: Design an algorithm which transforms a
I'm using the Levenshtein algorithm to find the similarity between two strings. This is
I found several algorithms to solve mazes. Those which are simple enough are suitable

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.