Good Morning stackoverflownians,
I have a very big table with duplicates on two columns. Means that if numbers on row a are duplicated in col1 and col2 on row b, I should keep only row a :
## table_1
col1 col2
1 10
1 10
1 10
1 11
1 11
1 12
2 20
2 20
2 21
2 21
# should return this tbl without duplication
col1 col2
1 10
1 11
1 12
2 20
2 21
My previous code account only for col1, and I don’t know how to query this on two coluns :
CREATE TABLE temp LIKE db.table_1;
INSERT INTO temp SELECT * FROM table_1 WHERE 1 GROUP BY col1;
DROP TABLE table_1;
ALTER TABLE temp RENAME table_1;
So I thought about that :
CREATE TABLE temp LIKE db.table_1;
INSERT INTO temp(col1,col2)
SELECT DISTINCT col1,col2 FROM table_1;
then drop and rename..
But I’m not sure it’s gonna work and MySQL tend to be unstable, if it takes too long I will have to stop the query and that my crash the server again .. T.T
We have 200,000,000 rows and all of them have at least one duplicate..
Any Suggestion of code ? 🙂
Also .. How long would it take ? minutes or hours ?
you already know quite a ways 🙂
you can try this also
Use
INSERT IGNORErather thanINSERT. If a record doesn’t duplicate an existing record, MySQL inserts it as usual. If the record is a duplicate, the IGNORE keyword tells MySQL to discard it silently without generating an error.Read from existing table and then write on a new table using
INSERT IGNORE. This way you can control insert process depending on your resource usage.When using INSERT IGNORE and you do have key violations, MySQL does NOT raise a warning!!!