I have a query as below to find duplicates in my table which contains more than 10,00,000 data with 10 fields. When I try to execute the query, it keeps on loading and loading for more than one hour but it cannot complete executing it.
When I try the same query with a similar table of just 100 records, it works fine.
(all column datatype is nchar)
I wonder then how I can use this for data of more than 10,00,000.
select * from table1 as L
where (select count(*) from table1
where L.date + L.time + L.color + L.supplier = table1.date +
table1.time + table1.color + table.supplier and table1.variety = 'dark'
and date between '01062012' and '30062012') > 1
DON’T use
L.date + L.time + L.color + L.supplier=table1.date +Doing so will MURDER any ability to use indexes in the join.table1.time + table1.color + table.supplier
Also, ensure that your table has an index covering all the join fields (variety, color, supplier, date).
There are other options for finding duplicates, such as using
ROW_NUMBER(), but we would need to know more about your table structure (unique id field, etc) and what does (and does not) constitute a duplicate.