Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4039450
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T12:36:47+00:00 2026-05-20T12:36:47+00:00

This turned out to be more difficult than I thought. Basically, each day a

  • 0

This turned out to be more difficult than I thought. Basically, each day a snapshot of a customer master list is being dumped by a system into CSV. It contains about 120000 records and 60 fields. About 25mb. Anyway, I’d like to report on values that change between one snapshot and another. It isn’t a plan file diff, as it must be matched on the leftmost column value which contains the customer’s unique number. Lines could be inserted/removed etc. All fields are strings, including the reference number.

I’ve written a solution with LINQ but it dies with larger datasets. For 10000 records, it takes 17 seconds. For 120000, it takes nearly 2 hours to compare the two files. Right now it uses the excellent and free ‘filehelpers’ http://www.filehelpers.com/ to load the data, this takes a few seconds only, then. But detecting which records have changed is more problematic. The below takes is the 2 hour query:

    var changednames = from f in fffiltered
                       from s in sffiltered
                       where f.CustomerRef == s.CustomerRef &&
                       f.Customer_Name != s.Customer_Name
                       select new { f, s };

What approach would you recommend? I’d like to immediately ‘prune’ the list to those with a change of some sort, then apply my more specific comparisons to that small subset. Some of my thoughts were:

a) Use dictionaries or Hashsets- though early tests don’t really show improvements

b) Compartmentalise the operations – use the first character in the customer reference field and match only against those with the same one. This probably involves creating many separate collections though and seems pretty inelegant.

c) move away from a typed data arrangement and do it with arrays. Again, benefit uncertain.

Any thoughts?

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T12:36:48+00:00Added an answer on May 20, 2026 at 12:36 pm

    For the purposes of the discussion below, I’ll assume that you have some way of reading the CSV files into a class. I’ll call that class MyRecord.

    Load the files into separate lists, call them NewList and OldList:

    List<MyRecord> NewList = LoadFile("newFilename");
    List<MyRecord> OldList = LoadFile("oldFilename");
    

    There’s perhaps a more elegant way to do this with LINQ, but the idea is to do a straight merge. First you have to sort the two lists. Either your MyRecord class implements IComparable, or you supply your own comparison delegate:

    NewList.Sort(/* delegate here */);
    OldList.Sort(/* delegate here */);
    

    You can skip the delegate if MyRecord implements IComparable.

    Now it’s a straight merge.

    int ixNew = 0;
    int ixOld = 0;
    while (ixNew < NewList.Count && ixOld < OldList.Count)
    {
        // Again with the comparison delegate.
        // I'll assume that MyRecord implements IComparable
        int cmpRslt = OldList[ixOld].CompareTo(NewList[ixNew]);
        if (cmpRslt == 0)
        {
            // records have the same customer id.
            // compare for changes.
            ++ixNew;
            ++ixOld;
        }
        else if (cmpRslt < 0)
        {
            // this old record is not in the new file.  It's been deleted.
            ++ixOld;
        }
        else
        {
            // this new record is not in the old file.  It was added.
            ++ixNew;
        }
    }
    
    // At this point, one of the lists might still have items.
    while (ixNew < NewList.Count)
    {
        // NewList[ixNew] is an added record
        ++ixNew;
    }
    
    while (ixOld < OldList.Count)
    {
        // OldList[ixOld] is a deleted record
    }
    

    With just 120,000 records, that should execute very quickly. I would be very surprised if doing the merge took as long as loading the data from disk.

    EDIT: A LINQ solution

    I pondered how one would do this with LINQ. I can’t do exactly the same thing as the merge above, but I can get the added, removed, and changed items in separate collections.
    For this to work, MyRecord will have to implement IEquatable<MyRecord> and also override GetHashCode.

    var AddedItems = NewList.Except(OldList);
    var RemovedItems = OldList.Except(NewList);
    
    var OldListLookup = OldList.ToLookup(t => t.Id);
    var ItemsInBothLists =
        from newThing in NewList
        let oldThing = OldListLookup[newThing.Id].FirstOrDefault()
        where oldThing != null
        select new { oldThing = oldThing, newThing = newThing };
    

    In the above, I assume that MyRecord has an Id property that is unique.

    If you want just the changed items instead of all the items that are in both lists:

    var ChangedItems =
        from newThing in NewList
        let oldThing = OldListLookup[newThing.Id].FirstOrDefault()
        where oldThing != null && CompareItems(oldThing, newThing) != 0
        select new { oldThing = oldThing, newThing = newThing };
    

    The assumption is that the CompareItems method will do a deep comparison of the two items and return 0 if they compare equal or non-zero if something has changed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This is a difficult and open-ended question I know, but I thought I'd throw
Update: This turned into a blog post, with updated links and code, over at
This is starting to vex me. I recently decided to clear out my FTP,
This is a bit of a long shot, but if anyone can figure it
This is kinda oddball, but I was poking around with the GNU assembler today
This might seem like a stupid question I admit. But I'm in a small
This is my first post here and I wanted to get some input from
This past summer I was developing a basic ASP.NET/SQL Server CRUD app, and unit
This error just started popping up all over our site. Permission denied to call
This most be the second most simple rollover effect, still I don't find any

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.