I have a partitioned table in SQL SERVER, there is a clustered index on ID, and the table is partitioned by period_key. The ID is unique inside partition, but not unique cross partitions. What I need is to find all the unique ID. The simplest way is just use
select unique ID from tab
But that need to sort the DB in database which need quit a lot of temp disk, so lots of disk IO is required. Since the system is already IO bounded, I am thinking about cut the disk IO. Since we can read each partition in order by using the cluster index, suppose we have 10 partition, we can read one row from each partition, then compare them, and output the record with the min ID, say from partition X, and then read the next row from the partition X. And again compare these 10 rows, output the record with the min ID, etc. Just like external sort.
I don’t have experience in C#, but know java. Could anyone give me some idea how to implement it in c#?
OK, if the requirement is to bypass the sort on the DB server side, and rather work out if an ID is unique or not on the client side, you can do something like this – select all ID values (no distinct in the query):
Then loop through all the values, adding them to a
List. When that’s done, you can ask the list to give back a version of itself with the duplicates removed. Here’s a simplistic example:Disclaimer – I wrote that off the top of my head, so it might contain errors. This post contains a faster implementation based on
HashSet.