Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8230211
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T17:03:35+00:00 2026-06-07T17:03:35+00:00

There are many questions on how to find duplicates in a database, but not

  • 0

There are many questions on how to find duplicates in a database, but not with the specific problem that I have.

I have a table with approx. 120000 entries. I need to find duplicates. To find them, I use a php script that is structured like the following:

//get all entries from database
//loop through them
    //get entries with greater id
    //compare all of them with the original one
    //update database (delete duplicate, update information in linked tables, etc.)

It is not possible to sort out all duplicates already in the initial query, because I have to loop through all entries since my duplicate search is sensitive not only to entries that are 100% alike, but also entries that are 90% alike. I use similar_text() for that.

I think the first loop is okay, but looping through all other entries within the loop is just too much. With 120000 entries this would be close to (120000^2)/2 iterations.

So instead of using a loop within the loop, there must be a better way to do it. Do you have any ideas? I thought about using in_array(), but it is not sensitive to something like 90% string similarity, and also doesn’t give me the array’s fields it found the duplicates in – I would need those to get the entries’ ids to update the database correctly.

Any ideas?

Thank you very much!

Charles

UPDATE 1

The query I am using right now is the following:

SELECT a.host_id
FROM host_webs a
JOIN host_webs b ON a.host_id != b.host_id AND a.web = b.web
GROUP BY a.host_id

It shows originals and duplicates perfectly, but I need to get rid of the originals, i.e. the first ones found with the associated data. How can I accomplish that?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T17:03:36+00:00Added an answer on June 7, 2026 at 5:03 pm

    You can JOIN the table onto itself and do it all in SQL (I know you say you don’t think you can, but I would be surprised if this is the case). All you need to do is put all the columns you use to test for duplicates into the ON clause of the JOIN.

    SELECT id
    FROM tablename a
    JOIN tablename b ON a.id != b.id AND a.col1 = b.col1 AND a.col2 = b.col2
    GROUP BY id
    

    This will return just the ids of the rows where col1 and col2 are duplicated. You can incorporate whatever string comparisons you need into this, the ON clause can be as complicated as you need it to be. For example:

    SELECT id
    FROM tablename a
    JOIN tablename b ON a.id != b.id AND
      (a.col1 = b.col1 AND (a.col2 = b.col2 OR a.col3 = b.col3))
      OR ((a.col1 = b.col1 OR a.col2 = b.col2) AND a.col3 = b.col3)
      OR (SOUNDEX(a.col1) = SOUNDEX(b.col1) AND SOUNDEX(a.col2) = SOUNDEX(b.col2) AND SOUNDEX(a.col3) = SOUNDEX(b.col3))
    GROUP BY id
    

    EDIT

    Since all you are actually doing with your query is looking for rows where the web column is identical, this would do the job of finding only the duplicates and not the original “good” records – assuming host_id is numeric and that the “good” record would be the one with the lowest host_id:

    SELECT b.host_id
    FROM host_webs a
    INNER JOIN host_webs b ON b.web = a.web AND b.host_id > a.host_id
    GROUP BY b.host_id
    

    I imagine the end game here would be to remove the duplicates, so if you are feeling brave you could actually delete them in one go:

    DELETE b.*
    FROM host_webs a
    INNER JOIN host_webs b ON b.web = a.web AND b.host_id > a.host_id
    

    The GROUP BY is not necessary in the DELETE statement because it doesn’t matter if you try and delete the same row more than once in a single statement.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

There are many questions like this but I can't find one that seems to
There are many similar questions, but I didn't find one that gets straight to
NOTE: I know there are many questions that talked about that but I'm still
I'm looking to implement my first Android database, but I have so many questions
There are many skills a programmer could have (understanding the problem, asking good questions,
There are many questions like this, but none of them seem to answer my
I know there are many questions similar but any of them didn't help. in
There are many similar questions, however they don't answer the problem of a url
I know there are many other questions similar to this one, but none of
I have a database structure that has two one-to-many relationships. I have a website,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.