I would like to find all duplicate records by name in a customer table using MySQL including those that do not match exactly.
I know I can use the query
SELECT id, name FROM customer GROUP BY name HAVING count(*) > 1;
to find all rows that match exactly, but I want to find all duplicate rows matching with a LIKE clause. For instance there might be a customer with the name “Mark’s Widgets” and another “Mark’s Widgets Inc.” I would like my query to find these as duplicates. So something along the lines of
SELECT id, name AS name1 ... WHERE name1 LIKE CONCAT("%", name2, "%") ...
I know that’s completely incorrect but that’s the idea. Here is the able schema:
mysql> describe customer;
+-----------------------------+--------------+------+-----+------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+--------------+------+-----+------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(140) | NO | | NULL | |
...
EDIT: To clarify, I want to find all duplicates, not just duplicates of one specific customer name.
It’s quite possible to do this, but before you even begin you need to define your rules regarding what is a match and what is not, without that you can’t go anywhere.
You could, for example, ignore the first and last 3 characters of the name and match on the middle characters, or you could choose more complex logic, but there is no magic method of achieving what you want, you will have to code the logic. Whatever your choice it needs to be defined before you start and before we can really help much.
No mysql here so excuse the syntax errors ( its t-sql syntax if any) but i’m thinking a self join