Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7407781
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T05:47:08+00:00 2026-05-29T05:47:08+00:00

I wrote the following function to calculate sizes of URLs. The input is a

  • 0

I wrote the following function to calculate sizes of URLs.
The input is a pre-built map, mapping filetype->object which holds a Set of url strings.
I run it on a a set of 3000 urls, and when I raised the number of threads until 20, I got better results each run. after 20 threads the performance starts decreasing.

My initial goal was to run it on a set of 500,000 urls. So I thought I’ll run it using 200 threads in the threadPool.
the result I got was:

FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 6. Size: 14 MB. Time took to calculate: 1010
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 2863. Size: 3 GB. Time took to calculate: 61004
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 3481. Size: 3 GB. Time took to calculate: 121002
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 3691. Size: 3 GB. Time took to calculate: 181004
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 3706. Size: 3 GB. Time took to calculate: 241004
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 3838. Size: 4 GB. Time took to calculate: 301004
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 4596. Size: 4 GB. Time took to calculate: 361004
FileType: pdf. CountTotalFiles: 394231 .CountCalcedSize: 5059. Size: 5 GB. Time took to calculate: 421008

and it was quite disappointing, because very soon, just after ~3000 urls, the performance decreases and only 100-150 urls are processed in a minute. As can be seen, 2000 urls were processed in the first minute.

Am I doing something wrong here using the thread pool the way i do?
Or is there another bottleneck here?

    void getSize(Map<String, Files> map) throws IOException {
        final BufferedWriter bw = new BufferedWriter(new FileWriter("bad_files.txt"));



        Set<Entry<String, Files>> entrySet = map.entrySet();
        for (Entry<String, Files> entry : entrySet) {
            final List<Long> sizeList = new ArrayList<Long>();
            Files filesObject = entry.getValue();
            HashSet<String> urlsSet = filesObject.urlsSet;
            ExecutorService pool = Executors.newFixedThreadPool(200);
            final long startTime = System.currentTimeMillis();

            final String k = entry.getKey();
            final Files value = entry.getValue();
            Timer t= new Timer();
            t.schedule(new TimerTask() {

                @Override
                public void run() {
                    long size = 0;
                    for (Long s : sizeList) {
                        size+=s;
                    }
                     System.out.println("FileType: " + k  + ". CountTotalFiles: " + value.urlsSet.size() + " .CountCalcedSize: " + sizeList.size() +". Size: " +  FileUtils.byteCountToDisplaySize(size) + ". Time took to calculate: " + (System.currentTimeMillis() - startTime));

                }
            }, 1000, 60000);


            for (final String urlStr : urlsSet) {

                Runnable call = new Runnable() {

                    @Override
                    public void run() {
                        HttpURLConnection urlCon = null;
                        try {
                            URL url= new URL(urlStr);
                             urlCon = (HttpURLConnection) url.openConnection();

//                          if (url.getProtocol().equals("https")) {
//                              setSSLContext((HttpsURLConnection)urlCon);
//                          }

                                if ( urlCon.getResponseCode() != HttpURLConnection.HTTP_OK) {
                                    bw.append("Response: " + urlCon.getResponseCode() + "  " +urlStr + "\n");
                                } else {
                                    sizeList.add(Long.valueOf(urlCon.getContentLength()));

                                }
//                              urlCon.disconnect();
                            } catch (Exception e) {
                                try {
                                    bw.append(e.getMessage()  + "  " +urlStr + "\n");
//                                  urlCon.disconnect();    
                                } catch (IOException e1) {
                                    e1.printStackTrace();
                                }
                            }
                        }
                    };


                    pool.submit(call);
                };
                pool.shutdown();
                try {
                    pool.awaitTermination(100, TimeUnit.DAYS);
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
                long size = 0;
                for (Long s : sizeList) {
                    size+=s;
                }

                System.out.println("FileType: " +  entry.getKey() + ". CountTotalFiles: " + entry.getValue().urlsSet.size() + " .CountCalcedSize: " + sizeList.size() +". Size: " +  FileUtils.byteCountToDisplaySize(size) + ". Time took to calculate: " + (System.currentTimeMillis() - startTime));     

            }


        bw.flush();
        bw.close();




    }
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T05:47:10+00:00Added an answer on May 29, 2026 at 5:47 am

    You should use HttpURLConnection.setRequestMethod(‘HEAD’) before opening the connection to increase performance, if you are just interested in the headers.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I wrote following function void validateUser(void) { string uName; string uPassword; char c; map
I wrote the following: Object.prototype.length = function(){ var count = -1; for(var i in
I wrote the following function in haskell, as it will enumerate every integer: integers
I want to call the Sleep function on ASM. So I wrote the following:
i have the following function i wrote to create an XML file using Xerces
I wrote the following function for removing duplicate characters from a string..For ex: if
hi I was writing a BST and wrote following function for adding Child. void
I wrote the following function to convert a time in milliseconds to a string
I wrote the following function to view data in a grid from F# interactive:
I wrote my own Magento function which calculates the percentage relation of orders to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.