I have to compare entries from a csv file that might contain more than 100000 entries and find pairs and store them in another file.
The comparison has to check values in two or more columns for instance:
Dogs 5
Cats 7
Mice 5
Dogs 3
Dogs 5
In this example I have to pick up the pair {Dogs, 5} and ignore the rest.
What approach would you suggest?
Thanks as usual
If your schema is really this simply, it could be accomplished in a minimal amount of code using
TupleandHashSet<T>.The basic strategy in any case is to create a data structure to track what you have seen and use that to determine what to output. A dictionary tracking counts could be used as well. However, as a means of memory versus code trade-off, I’ve chosen to use two sets instead of one dictionary: