Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7898319
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T08:24:02+00:00 2026-06-03T08:24:02+00:00

I have to compare entries from a csv file that might contain more than

  • 0

I have to compare entries from a csv file that might contain more than 100000 entries and find pairs and store them in another file.
The comparison has to check values in two or more columns for instance:

Dogs 5

Cats 7

Mice 5

Dogs 3

Dogs 5

In this example I have to pick up the pair {Dogs, 5} and ignore the rest.
What approach would you suggest?

Thanks as usual

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T08:24:04+00:00Added an answer on June 3, 2026 at 8:24 am

    If your schema is really this simply, it could be accomplished in a minimal amount of code using Tuple and HashSet<T>.

    The basic strategy in any case is to create a data structure to track what you have seen and use that to determine what to output. A dictionary tracking counts could be used as well. However, as a means of memory versus code trade-off, I’ve chosen to use two sets instead of one dictionary:

    // 1. Data structure to track items we've seen
    var found = new HashSet<Tuple<string, int>>();
    
    // 2. Data structure to track items we should output
    var output = new HashSet<Tuple<string, int>>();
    
    // 3. Loop over the input data, storing it into `found`
    using (var input = File.OpenText(path))
    {
        string line;
        while (null != (line = input.ReadLine()))
        {
            // 4. Do your CSV parsing
            var parts = line.Split(','); // <- need better CSV parsing
            var item = Tuple.Create(parts[0], Int32.Parse(parts[1]));
    
            // 5. Track items we've found and those we should output
            // NB: HashSet.Add returns `false` if it already exists,
            // so we use that as our criteria to mark the item for output
            if (!found.Add(item)) output.Add(item);
        }
    }
    
    // 6. Output the items
    // NB: you could put this in the main loop and borrow the same strategy
    // we used for `found` to determine when to output an item so that only
    // one pass is needed to read and write the data.
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have two long list, one from a log file that contains lines formatted
I have two NSMutableArrays(initialized from plist file contents) that contains Dictionary objects. Could anyone
I'm supposed to add data from my csv file into my database. I have
I have to compare a value in a string array to that of a
I have two CSV files (three columns) which I need to compare and extract
I have two lists with data that I want to compare dates for. I
I have a comparator class in Java to compare Map entries: public class ScoreComp
I have n csv files which I need to compare against each other and
I have two tables and need to search for all entries that exist in
I'm compiling a lookup table that needs to have 133,784,560 entries, with values ranging

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.