I have written following algorithm into C# code to list down the files inside

Question

0

Editorial Team

Asked: June 12, 20262026-06-12T11:23:15+00:00 2026-06-12T11:23:15+00:00

I have written following algorithm into C# code to list down the files inside

0

I have written following algorithm into C# code to list down the files inside a folder recursively.

Begin Iterating through the list of files in the directory & its sub
directories.
Store file Name & Path in a list.
If current file matches any other file in the list, during
mark both files as duplicate.
Fetch all files from the list which were marked duplicate.
Group them by name & return.

The implementation is very slow on a folder containing 50,000 files and 12,000 sub directories. As disk read operation is basically time consuming task. Even LINQ.Parallel() doesn’t help much.

Implmentation:

class FileTuple
{
    public string FileName { set; get; }
    public string ContainingFolder { set; get; }
    public bool HasDuplicate { set; get; }
    public override bool Equals(object obj)
    {
        if (this.FileName == (obj as FileTuple).FileName)
            return true;
        return false;
    }
}

FileTuple class keeps track of filenames & containing directory, the
flag keeps track of duplicate status.
I have overridden the equals method to compare only files names, in
the collection of fileTuples.

Following method finds the duplicate files and return as a list.

    private List<FileTuple> FindDuplicates()
    {
        List<FileTuple> fileTuples = new List<FileTuple>();
        //Read all files from the given path
        List<string> enumeratedFiles = Directory.EnumerateFiles(txtFolderPath.Text, "*.*", SearchOption.AllDirectories).Where(str => str.Contains(".exe") || str.Contains(".zip")).AsParallel().ToList();
        foreach (string filePath in enumeratedFiles)
        {
            var name = Path.GetFileName(filePath);
            var folder = Path.GetDirectoryName(filePath);
            var currentFile = new FileTuple { FileName = name, ContainingFolder = folder, HasDuplicate = false, };

            int foundIndex = fileTuples.IndexOf(currentFile);
            //mark both files as duplicate, if found in list
            //assuming only two duplicate file
            if (foundIndex != -1)
            {
                currentFile.HasDuplicate = true;                    
                fileTuples[foundIndex].HasDuplicate = true;
            }
            //keep of track of the file navigated
            fileTuples.Add(currentFile);
        }

        List<FileTuple> duplicateFiles = fileTuples.Where(fileTuple => fileTuple.HasDuplicate).Select(fileTuple => fileTuple).OrderBy(fileTuple => fileTuple.FileName).AsParallel().ToList();
        return duplicateFiles;
    }

Can you please suggest a way to improve the performance.

Thank you for your help.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T11:23:16+00:00

Can you please suggest a way to improve the performance.

Well one obvious improvement would be to use a Dictionary<FileTuple, FileTuple> as well as a List<FileTuple>. That way you wouldn’t have an O(N) IndexOf operation on each check. Note that you’ll also need to override GetHashCode() – you should already have a warning about this.

I doubt that it’ll make very much difference though – I’d expect this to be mostly IO-bound.

Additionally, I doubt that the filtering and ordering at the end is going to be a significant bottleneck, so using the AsParallel in the final step isn’t likely to do much. Of course, you should measure all of this.

Finally, the whole method can be made rather simpler, without even needing the HasDuplicate flag or any overriding of Equals / GetHashCode:

private List<FileTuple> FindDuplicates()
{
    return Directory.EnumerateFiles(txtFolderPath.Text, "*.*", 
                                    SearchOption.AllDirectories)
                    .Where(str => str.Contains(".exe") || 
                           str.Contains(".zip")
                    .Select(str => new FileTuple { 
                               FileName = Path.GetFileName(str),
                               ContainingFolder = Path.GetDirectoryName(str))
                            })
                    .GroupBy(tuple => tuple.FileName)
                    .Where(g => g.Count() > 1) // Only keep duplicates
                    .OrderBy(g => g.Key)       // Order by filename
                    .SelectMany(g => g)        // Flatten groups
                    .ToList();                     
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have written following algorithm into C# code to list down the files inside

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply