Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 751425
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T14:39:56+00:00 2026-05-14T14:39:56+00:00

I’m currently writing a program that needs to compare each file in an ArrayList

  • 0

I’m currently writing a program that needs to compare each file in an ArrayList of variable size. Right now, the way I’m doing this is through a nested code loop:

         if(tempList.size()>1){
            for(int i=0;i<=tempList.size()-1;i++)
                //Nested loops.  I should feel dirty?
                for(int j=i+1;j<=tempList.size()-1;j++){
                    //*Gets sorted.
                    System.out.println(checkBytes(tempList.get(i), tempList.get(j)));
                }
            }

I’ve read a few differing opinions on the necessity of nested loops, and I was wondering if anyone had a more efficient alternative.

At a glance, each comparison is going to need to be done, either way, so the performance should be fairly steady, but I’m moderately convinced there’s a cleaner way to do this. Any pointers?

EDIT:: This is only a part of the function, for clarity. The files have been compared and put into buckets based on length – after going through the map of the set, and finding a bucket which is greater than one in length, it runs this. So – these are all files of the same size. I will be doing a checksum comparison before I get to bytes as well, but right now I’m just trying to clean up the loop.

Also, holy cow this site responds fast. Thanks, guys.

EDIT2:: Sorry, for further clarification: The file handling part I’ve got a decent grasp on, I think – first, I compare and sort by length, then by checksum, then by bytes – the issue I have is how to properly deal with needing to compare all files in the ArrayList efficiently, assuming they all need to be compared. If a nested loop is sufficient for this, that’s cool, I just wanted to check that this was a suitable method, convention-wise.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T14:39:56+00:00Added an answer on May 14, 2026 at 2:39 pm

    My answer to your EDIT2 question is in two parts

    The part is that if you have a small number of files, then your nested loop approach should be fine. The performance is O(N**2) and the optimal solution is O(N). However, if N is small enough it won’t make much difference which approach you use. You only need to consider an alternative solution if you are sure that N can be large.

    The second part spells out an algorithm that exploits file hashes to get an O(N) solution for detecting duplicates. This is what the previous answers alluded to.

    1. Create a FileHash class to represent file hash values. This needs to define equals(Object) and hashCode() methods that implement byte-wise equality of the file hashes.

    2. Create a HashMap<FileHash, List<File>> map instance.

    3. For each File in your input ArrayList:

      1. Calculate the hash for the file, and create a FileHash object for it.
      2. Lookup the FileHash in the map:
      3. If you found an entry, perform a byte-wise comparison of your current file with each of the files in the list you got from the map. If you find a duplicate file in the list, BINGO! Otherwise add current file to the list.
      4. If you didn’t find an entry, create a new map entry with the “FileHash` as the key, and the current file as the first element of the value list.

    (Note that the map above is really a multi-map, and that there are 3rd party implementations available; e.g. in Apache commons collections and Google collections. I’ve presented the algorithm in the form above for the sake of simplicity.)

    Some performance issues:

    • If you use a good cryptographic hash function to generate your file hashes, then the chances of finding an entry in 3.3 that has more than one element in the list are vanishingly small, and the chances that the byte-wise comparison of the files will not say the files are equal is also vanishingly small. However, the cost of calculating the crypto hash will be greater than the cost of calculating a lower quality hash.

    • If you do use a lower quality hash, you can mitigate the potential cost of comparing more files by looking at the file sizes before you do the byte-wise comparison. If you do that you can make the map type HashMap<FileHash, List<FileTuple>> where FileTuple is a class that holds both a File and its length.

    • You could potentially decrease the cost of hashing by using a hash of just (say) the first block of each file. But that increases the probability that two files may have the same hash but still be different; e.g. in the 2nd block. Whether this is significant depends on the nature of the files. (But for example if you just checksummed the first 256 bytes of a collection of source code files, you could get a huge number of collisions … due to the presence of identical copyright headers!)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 388k
  • Answers 388k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer I haven't tried it, but 'jssha' seems to have added… May 15, 2026 at 12:28 am
  • Editorial Team
    Editorial Team added an answer The difference between them is the number of seconds. There… May 15, 2026 at 12:28 am
  • Editorial Team
    Editorial Team added an answer You can only access iframes if they are coming from… May 15, 2026 at 12:28 am

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.