Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8640887
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T11:23:15+00:00 2026-06-12T11:23:15+00:00

I have written following algorithm into C# code to list down the files inside

  • 0

I have written following algorithm into C# code to list down the files inside a folder recursively.

  1. Begin Iterating through the list of files in the directory & its sub
    directories.
  2. Store file Name & Path in a list.
  3. If current file matches any other file in the list, during
    mark both files as duplicate.
  4. Fetch all files from the list which were marked duplicate.
  5. Group them by name & return.

The implementation is very slow on a folder containing 50,000 files and 12,000 sub directories. As disk read operation is basically time consuming task. Even LINQ.Parallel() doesn’t help much.

Implmentation:

class FileTuple
{
    public string FileName { set; get; }
    public string ContainingFolder { set; get; }
    public bool HasDuplicate { set; get; }
    public override bool Equals(object obj)
    {
        if (this.FileName == (obj as FileTuple).FileName)
            return true;
        return false;
    }
}
  1. FileTuple class keeps track of filenames & containing directory, the
    flag keeps track of duplicate status.
  2. I have overridden the equals method to compare only files names, in
    the collection of fileTuples.

Following method finds the duplicate files and return as a list.

    private List<FileTuple> FindDuplicates()
    {
        List<FileTuple> fileTuples = new List<FileTuple>();
        //Read all files from the given path
        List<string> enumeratedFiles = Directory.EnumerateFiles(txtFolderPath.Text, "*.*", SearchOption.AllDirectories).Where(str => str.Contains(".exe") || str.Contains(".zip")).AsParallel().ToList();
        foreach (string filePath in enumeratedFiles)
        {
            var name = Path.GetFileName(filePath);
            var folder = Path.GetDirectoryName(filePath);
            var currentFile = new FileTuple { FileName = name, ContainingFolder = folder, HasDuplicate = false, };

            int foundIndex = fileTuples.IndexOf(currentFile);
            //mark both files as duplicate, if found in list
            //assuming only two duplicate file
            if (foundIndex != -1)
            {
                currentFile.HasDuplicate = true;                    
                fileTuples[foundIndex].HasDuplicate = true;
            }
            //keep of track of the file navigated
            fileTuples.Add(currentFile);
        }

        List<FileTuple> duplicateFiles = fileTuples.Where(fileTuple => fileTuple.HasDuplicate).Select(fileTuple => fileTuple).OrderBy(fileTuple => fileTuple.FileName).AsParallel().ToList();
        return duplicateFiles;
    }

Can you please suggest a way to improve the performance.

Thank you for your help.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T11:23:16+00:00Added an answer on June 12, 2026 at 11:23 am

    Can you please suggest a way to improve the performance.

    Well one obvious improvement would be to use a Dictionary<FileTuple, FileTuple> as well as a List<FileTuple>. That way you wouldn’t have an O(N) IndexOf operation on each check. Note that you’ll also need to override GetHashCode() – you should already have a warning about this.

    I doubt that it’ll make very much difference though – I’d expect this to be mostly IO-bound.

    Additionally, I doubt that the filtering and ordering at the end is going to be a significant bottleneck, so using the AsParallel in the final step isn’t likely to do much. Of course, you should measure all of this.

    Finally, the whole method can be made rather simpler, without even needing the HasDuplicate flag or any overriding of Equals / GetHashCode:

    private List<FileTuple> FindDuplicates()
    {
        return Directory.EnumerateFiles(txtFolderPath.Text, "*.*", 
                                        SearchOption.AllDirectories)
                        .Where(str => str.Contains(".exe") || 
                               str.Contains(".zip")
                        .Select(str => new FileTuple { 
                                   FileName = Path.GetFileName(str),
                                   ContainingFolder = Path.GetDirectoryName(str))
                                })
                        .GroupBy(tuple => tuple.FileName)
                        .Where(g => g.Count() > 1) // Only keep duplicates
                        .OrderBy(g => g.Key)       // Order by filename
                        .SelectMany(g => g)        // Flatten groups
                        .ToList();                     
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written down the following program that uses the quicksort algorithm to sort
I have written following piece of code in C: EXEC SQL begin declare section;
I have written following code to find out the coordinates of list of countries.
i have following code for copying content of vector into file #include<iterator> #include<algorithm> #include<fstream>
I have written the following algorithm in order to evaluate a function in MatLab
I have written following code to attach gesture recogniser to multiple imageviews. [imageview1 setUserInteractionEnabled:YES];
I am trying my hands on WPF MVVM. I have written following code in
i have written the following code class Program { static void Main(string[] args) {
I have written the following code, CrystalDecisions.CrystalReports.Engine.ReportDocument report = new CrystalDecisions.CrystalReports.Engine.ReportDocument(); report.Load(@C:\Users\XXX\Desktop\Backup1\Project\ReportsFolder\ReportSalesInvoice.rpt); Report works
I have written a following code to get just the file name without extension

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.