Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3681974
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T03:44:39+00:00 2026-05-19T03:44:39+00:00

A web-bot crawling your site and using bandwdith resources. Bots are numerous and for

  • 0
  • A web-bot crawling your site and using bandwdith resources.

  • Bots are numerous and for many purposes, starting from homemade, university research, scrappers, new startups to established search engines (and many more categories probably)

Apart from large search engines which can potentially send traffic to a site, why webmasters allow other bots whose purpose they do not know immediately ?
What are the incentives for webmasters to allow these bots ?

2nd question is:

Should a distributed crawler with multiple crawlagent-nodes on internet, use different User-Agent string for each agent, because if they all use same UA, then benefit of scaling via multiple agents is highly reduced.
Because large websites with high crawl-delay set, may take weeks or months to crawl fully.

3rd question:
Since robots.txt (the only defined crawl control method) is at domain level.
Should crawler have politeness policy per domain or per IP (sometimes many websites hosted on same IP) .

How to tackle such web poilteness problems ? Any other related things to keep in mind ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T03:44:40+00:00Added an answer on May 19, 2026 at 3:44 am
    1. There are many useful bots besides search engine bots and there are a growing number of search engines. In any case, the bots you want to block are probably using incorrect user-agent strings and ignoring your robots.txt files so how are you going to stop them? You can block some at the IP level once you detect them but for others it’s hard.

    2. The user agent string has nothing to do with crawl rate. Millions of browser users are all using the same user agent string. Web sites throttle access based on your IP address. If you want to crawl their site faster you’ll need more agents, but really, you shouldn’t be doing that – your crawler should be polite and should be crawling each individual site slowly whilst making progress on many other sites.

    3. Crawler should be polite per-domain. A single IP may server many different servers but that’s no sweat for the router that’s passing packets to and fro. Each individual server will likely limit your ability to maintain multiple connections and how much bandwidth you can consume. There’s also the one-web-site-served-by-many-IP addresses scenario (e.g. round robin DNS or something smarter): sometimes bandwidth and connection limits on sites like these will happen at the router-level, so once again, be polite per domain.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Happy New Year everybody,now I am trying to develop my own bot( web crawler
I'm using cUrl to POST to a web page (not local) and then return
Web applications that want to force a resource to be downloaded rather than directly
Web browsers are good as thin clients for web applications. But if the user
Web parts seem to be used extensively in Sharepoint related development, but examples of
Some web applications, like Google Docs, store data generated by the users. Data that
The web applications I develop often require co-dependent configuration settings and there are also
My web application generates pdf files and either e-mails or faxes them to our
My web application has a login page that submits authentication credentials via an AJAX
Our web services are distributed across different servers for various reasons (such as decreasing

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.