Simple situation. I have a list of lists, almost table like, and I am trying to find out if any of the lists are duplicated.
Example:
List<List<int>> list = new List<List<int>>(){
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};
I would like to know that there are 4 total items, 2 of which are duplicates. I was thinking about doing something like a SQL checksum but I didn’t know if there was a better/easier way.
I care about performance, and I care about ordering.
Additional Information That May Help
- Things inserted into this list will never be removed
- Not bound to any specific collection.
- Dont care about function signature
- They type is not restricted to int
Let’s try to get best performace. if n is number of lists and m is length of lists then we can get O(nm + nlogn + n) plus some probability of hash codes to be equal for different lists.
Major steps:
* this is important step. for simlicity you can calc hash as = … ^ (list[i] << i) ^ (list[i + 1] << (i + 1))
Edit for those people that think that PLINQ can boost the thing, but not good algorythm. PLINQ can also be added here, because all the steps are easily parallelizable.
My code: