I have to compare entries from a csv file that might contain more than

Question

0

Asked: June 3, 20262026-06-03T08:24:02+00:00 2026-06-03T08:24:02+00:00

I have to compare entries from a csv file that might contain more than

0

I have to compare entries from a csv file that might contain more than 100000 entries and find pairs and store them in another file.
The comparison has to check values in two or more columns for instance:

Dogs 5

Cats 7

Mice 5

Dogs 3

Dogs 5

In this example I have to pick up the pair {Dogs, 5} and ignore the rest.
What approach would you suggest?

Thanks as usual

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T08:24:04+00:00

If your schema is really this simply, it could be accomplished in a minimal amount of code using Tuple and HashSet<T>.

The basic strategy in any case is to create a data structure to track what you have seen and use that to determine what to output. A dictionary tracking counts could be used as well. However, as a means of memory versus code trade-off, I’ve chosen to use two sets instead of one dictionary:

// 1. Data structure to track items we've seen
var found = new HashSet<Tuple<string, int>>();

// 2. Data structure to track items we should output
var output = new HashSet<Tuple<string, int>>();

// 3. Loop over the input data, storing it into `found`
using (var input = File.OpenText(path))
{
    string line;
    while (null != (line = input.ReadLine()))
    {
        // 4. Do your CSV parsing
        var parts = line.Split(','); // <- need better CSV parsing
        var item = Tuple.Create(parts[0], Int32.Parse(parts[1]));

        // 5. Track items we've found and those we should output
        // NB: HashSet.Add returns `false` if it already exists,
        // so we use that as our criteria to mark the item for output
        if (!found.Add(item)) output.Add(item);
    }
}

// 6. Output the items
// NB: you could put this in the main loop and borrow the same strategy
// we used for `found` to determine when to output an item so that only
// one pass is needed to read and write the data.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have to compare entries from a csv file that might contain more than

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply