Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8308415
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T18:48:29+00:00 2026-06-08T18:48:29+00:00

I want to find duplicate files on the file system in C++. Is there

  • 0

I want to find duplicate files on the file system in C++. Is there any algorithm to do that as fast as possible? And do I need to create a multi-threaded application, or I can just use one thread to do it?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T18:48:31+00:00Added an answer on June 8, 2026 at 6:48 pm

    I concur with Kerrek SB that there are better tools for this than C++, however, assuming you really need to do this in C++, here are some suggestions and things to consider in your implementation:

    1. use boost::filesystem for portable filesystem traversal

    2. the hash every file suggestion is very reasonable, but it might be more efficient to first make a multimap where the file size is the key. Then only apply the hash when there are files of duplicate size.

    3. decide how you want to treat empty files and symbolic links/short cuts

    4. decied how you want to treat special files, e.g. on unix you have directories fifos, sockets etc

    5. account for the fact that files or directory structure may change, disappear or move while your algorithm is running

    6. account for the fact that some files or directories may be inaccessible or broken (e.g. recursive directory links)

    7. Make the number of threads configurable as the amount of parallelization that makes sense depends on the underlying disk hardware and configuration. It will be different if you are on a simple hard drive vs an expensive san. Don’t make assumptions, though; Test it out. For instance, Linux is very good about caching files so many of your reads will come from memory, and thus not block on i/o.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I want to find out if it is possible to create Android API's that
Possible Duplicate: How do I pass parameters to the File::Find subroutine that processes each
Possible Duplicate: Find and Replace more than 1 word? I want to use jQuery
Possible Duplicate: cannot find interface declaration for ‘AbstractPickerView’,superclass of ‘AttackLayer’ There are 3 header
Possible Duplicate: Create a CSV File for a user in PHP Even though title
I want to find files containing the word navbar anywhere in files. I can
I want to find a way to copy one file to multiple locations simultaneously
I want to find the first element that has appeared in previous positions in
Possible Duplicate: Resharper string.format shortcut In our large code base, unfortunately there are a
Possible Duplicate: How to download a text file or some objects from webpage using

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.