i have a table eng-jap which is essentially just a translation so having an

Question

0

Asked: June 14, 20262026-06-14T07:22:02+00:00 2026-06-14T07:22:02+00:00

i have a table eng-jap which is essentially just a translation so having an

0

i have a table eng-jap which is essentially just a translation so having an english and a japanese column. a script i made somehow cause every insert to have a clone and thus 1000s of duplicate entries in this table, for example:

duplicate example A

eng                        jap
"mother washes every day"  "母は毎日洗濯する"
"mother washes every day"  "母は毎日洗濯する"

if it were just one column i could use the query:

SELECT eng, COUNT(*) c FROM `eng-jap` GROUP BY eng HAVING c > 1

but since the table can legitimately have a duplicates in eng or jap, as long as its not in both. for example:

duplicate example B

eng                        jap
"mother washes every day"  "母は毎日洗濯する"
"every day mother washes"  "母は毎日洗濯する"

this is to allow one sentence to have more than one translation. so i need to alter the query to find duplicates as a combination of both columns i guess you could say.

once again to be clear. example B is fine, i want to select all duplicates like example A so i can make a scrip to remove one of all of the duplicates. please and Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T07:22:04+00:00

I think you just need to group by eng and jap:

SELECT eng, jap, COUNT(*) c FROM `eng-jap` GROUP BY eng, jap HAVING c > 1

And if you want to remove all duplicates, if your rows have an id, this query shows all the ids that you have to keep:

select
  SUBSTRING_INDEX(GROUP_CONCAT(CAST(id AS CHAR) order by id), ',', 1) as id
from `eng-jap`
group by eng, jap

(it’s a trick that uses GROUP_CONCAT to find the first id of every combination of eng/jap). And this query shows the ids of the rows you have to delete:

select id
from
  `eng-jap`
     left join
  (select
     SUBSTRING_INDEX(GROUP_CONCAT(CAST(id AS CHAR) order by id), ',', 1) as id
     from `eng-jap`
     group by eng, jap) `eng-jap-dup`
  on `eng-jap`.id = `eng-jap-dup`.id
where `eng-jap-dup`.id is null

I rewrote this query using just join, it has to be a little faster, but if your table is too big it is probably still slow.

If it is still too slow and it still doesn’t work, i would suggest you to add two more columns to your table:

eng-hash, where you can save MD5(eng)
jap-hash, where you can save MD5(jap)

then update all of your records like this:

update `eng-jap` set `eng-jap`.`eng-hash` = MD5(eng), `eng-jap`.`jap-hash` = MD5(jap)

then you can add a unique index on the table on both columns, ignore all errors, and let MySql do the work to eliminate duplicates for you:

alter ignore table `eng-jap` add unique index (eng-hash, jap-hash);

(if you get an error while creating index, see this question: MySQL: ALTER IGNORE TABLE gives "Integrity constraint violation")

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i have a table eng-jap which is essentially just a translation so having an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply