I have data in the following format:
DATE DATA1 DATA2
-------------------------------------------------
20121010 ABC DEF
20121010 DEF ABC
20121010 HIJ KLM
20121010 KLM HIJ
20121212 ABC DEF
20121212 DEF ABC
20121212 HIJ KLM
20121212 KLM HIJ
What I want to do is select rows 1 and 3. I don’t care about rows 2 and 4 because they are essentially “duplicates” in my eyes.
Seems simple but I’m just trying to put the query together to accomplish this.
You can use the
row_number()function for this, assuming you are using version 2005 or higher:The expression
order by dateshould produce an arbitrary ordering in any database that supportsrow_number. In SQL Server, you can also useorder by (select NULL).Or, I realize that your question may be about eliminate duplicates, regardless of order. For that, you can do:
This might, however, rearrange the two values, when only one row appears.
The more complicated solution to maintain the original ordering of the columns and eliminate the additional rows combines the two approaches: