Although this issue has been brought up in the past, I’m curious if this

Question

0

Asked: June 13, 20262026-06-13T14:14:55+00:00 2026-06-13T14:14:55+00:00

Although this issue has been brought up in the past, I’m curious if this

0

Although this issue has been brought up in the past, I’m curious if this is still the best way to clean up duplicate entries in a large (3M and growing) table. After each bulk insert I run this line to keep things tidy, but it’s starting to take a very long time to execute.

Duplicate rows can only be determined through 3 columns. The others either auto increment, have uniqueIDs, sources, etc.

Here’s what I currently have going –

DELETE n1 
FROM main n1, main n2 
WHERE n1.id < n2.id 
AND n1.col1 = n2.col1 
AND n1.col2 = n2.col2 
AND n1.col3 = n2.col3

Any chance I could speed this up, or is this as good as it gets?

Thank you for any help/insight!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T14:14:56+00:00

Add a unique Index to your table on columns col1, col2 and col2 like this.

ALTER TABLE `main` ADD UNIQUE INDEX `col1_col2_col3` (`col1`, `col2`, `col3`);

And this will prevent inserting duplicate rows to your table.

For example:
After you insert this values;

INSERT INTO `main` (`col1`, `col2`, `col3`) VALUES (1, 11, 111);

You can’t insert this, you will get duplicate row error

INSERT INTO `main` (`col1`, `col2`, `col3`) VALUES (1, 11, 111);

With correct unique indexes you don’t have to worry later for duplicate records.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Although this issue has been brought up in the past, I’m curious if this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply