Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8339817
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T04:59:13+00:00 2026-06-09T04:59:13+00:00

I am getting a bunch of relatively small pages from a website and was

  • 0

I am getting a bunch of relatively small pages from a website and was wondering if I could somehow do it in parallel in Bash. Currently my code looks like this, but it takes a while to execute (I think what is slowing me down is the latency in the connection).

for i in {1..42}
do
    wget "https://www.example.com/page$i.html"
done

I have heard of using xargs, but I don’t know anything about that and the man page is very confusing. Any ideas? Is it even possible to do this in parallel? Is there another way I could go about attacking this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T04:59:14+00:00Added an answer on June 9, 2026 at 4:59 am

    Much preferrable to pushing wget into the background using & or -b, you can use xargs to the same effect, and better.

    The advantage is that xargs will synchronize properly with no extra work. Which means that you are safe to access the downloaded files (assuming no error occurs). All downloads will have completed (or failed) once xargs exits, and you know by the exit code whether all went well. This is much preferrable to busy waiting with sleep and testing for completion manually.

    Assuming that URL_LIST is a variable containing all the URLs (can be constructed with a loop in the OP’s example, but could also be a manually generated list), running this:

    echo $URL_LIST | xargs -n 1 -P 8 wget -q
    

    will pass one argument at a time (-n 1) to wget, and execute at most 8 parallel wget processes at a time (-P 8). xarg returns after the last spawned process has finished, which is just what we wanted to know. No extra trickery needed.

    The “magic number” of 8 parallel downloads that I’ve chosen is not set in stone, but it is probably a good compromise. There are two factors in “maximising” a series of downloads:

    One is filling “the cable”, i.e. utilizing the available bandwidth. Assuming “normal” conditions (server has more bandwidth than client), this is already the case with one or at most two downloads. Throwing more connections at the problem will only result in packets being dropped and TCP congestion control kicking in, and N downloads with asymptotically 1/N bandwidth each, to the same net effect (minus the dropped packets, minus window size recovery). Packets being dropped is a normal thing to happen in an IP network, this is how congestion control is supposed to work (even with a single connection), and normally the impact is practically zero. However, having an unreasonably large number of connections amplifies this effect, so it can be come noticeable. In any case, it doesn’t make anything faster.

    The second factor is connection establishment and request processing. Here, having a few extra connections in flight really helps. The problem one faces is the latency of two round-trips (typically 20-40ms within the same geographic area, 200-300ms inter-continental) plus the odd 1-2 milliseconds that the server actually needs to process the request and push a reply to the socket. This is not a lot of time per se, but multiplied by a few hundred/thousand requests, it quickly adds up.
    Having anything from half a dozen to a dozen requests in-flight hides most or all of this latency (it is still there, but since it overlaps, it does not sum up!). At the same time, having only a few concurrent connections does not have adverse effects, such as causing excessive congestion, or forcing a server into forking new processes.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

So, I'm getting a bunch of string values from a JSON object and then
Can Some one help me the problem with this code? I am getting bunch
I'm copying html from an old website to a new one. I'm getting a
I'm getting bunch of XML datas from an external API. It gives me all
I am getting a bunch of information from a database. To make sure various
I am getting a bunch of objects from an F# assembly, which I am
I'm getting a bunch of text from an outside source, saving it in a
I'm getting input from an html form. There are a bunch of text inputs,
I'm getting a bunch of failing tests (21 to be exact)/ an error message
i'm loading bunch of images in coverflow using i carousel but i'm not getting

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.