Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8304317
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T17:48:33+00:00 2026-06-08T17:48:33+00:00

Is there any tools for identifying, and merging non exact duplicates in MySQL tables?

  • 0

Is there any tools for identifying, and merging non exact duplicates in MySQL tables?

I have a large data set with many duplicates like:

1348,  Auto Motors, 12 Long Road, etc
48264, Auto Mtors,  12 Log Road,  etc
82743, Ato Motoers, 12 Lng Road,  etc
83821, Auto Motors, 13 Long Road, etc
92743, Auto Motors, 11 Long Road, etc

There are many tables needed to be merged like:

  • Companies
  • Addresses
  • Phone Numbers
  • Employees

There is about 100,000 rows, and 30-40 columns to match on each row (joined tables).

So, anyone know of a tool for sorting this out? I already have MySQL, PHP installed. I have/can use(d) MongoDB, and Solr before if they would help. And I am open to installing other software if needed.


Alternatively what kind of queries should I run if I cannot find a tool to handle this.

A simple find all duplicates wont work because they are not exact.

Doing wildcard like searches would be extremely slow for all the different combinations I would need to try.

Using a Oliver or Levenshtein (MySQL) may work, and there is too much data to pull into PHP (also probably extremely slow).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T17:48:37+00:00Added an answer on June 8, 2026 at 5:48 pm

    You have data that requires massaging. I don’t think this is something you can do entirely in sql.

    Google Refine is a great tool for massaging. I would load the data in Refine first, clean it up, then import into your relational database.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Are there any tools or tricks how to automatically extract tables from pdfs. Are
Are there any tools such as http://www.generatedata.com/ which will generate dummy data for NHibernate
Are there any tools which help manage plain old c structures? I have a
Are there any tools to effectively compare two XML schema's? I have seen some
Are there any tools available for validating a database schema against a set of
Is there any tools to assist in diagramming a large C library? I am
Are there any tools for UI screen design for mobile devices or Is most
Is there any tools or utilities to help in debugging ASP.NET MVC Routing issues?
Are there any tools/ ant tasks that could be integrated into the ant build
Are there any tools for performing static analysis of Scala code, similar to FindBugs

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.