Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8096949
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T21:36:12+00:00 2026-06-05T21:36:12+00:00

I have a web crawler that looks for specific information I want and returns

  • 0

I have a web crawler that looks for specific information I want and returns it. This is run daily.

The issue is that my crawler has to do two things.

  1. Get the link it has to crawl.
  2. Crawl said link and push stuff to the db.

The issue with #1 is, there are 700+ links in total. These links don’t change VERY frequently – maybe once a month?

So one option is just to do a separate crawl for the ‘list of links’, once a month, and dump the links into the db.

Then, have the crawler do a db hit for each of those 700 links every day.

Or, I can just have a nested crawl within my crawler – where every single time the crawler is run (daily), it updates this list of 700 URLs and stores it in an array and pulls it from this array to do crawl each link.

Which is more efficient and be less taxing on Heroku – or whichever host?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T21:36:14+00:00Added an answer on June 5, 2026 at 9:36 pm

    It depends on how you measure “efficiency” and “taxing”, but the local database hit is almost certain to be faster and “better” than an HTTP request + parsing an HTML(?) response for the links.

    Further, not that it likely matters, but (assuming your database and adapter support it) you can begin to iterate through the DB request results and process them without waiting for or fetching the entire set into memory.

    Network latency and resources are going to be much worse than poking at a DB that is already sitting there, running, and designed to be queried efficiently and quickly.

    However: once per day? Is there a good reason to spend any energy optimizing this task?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a website that has multiple languages. The way this is set up
I have a simple web crawler that starts at root (given url) downloads the
I have implemented a web crawler that crawls and retrieves content from .edu TLD.
So, here's my question: I have a crawler that goes and downloads web pages
I have created a simple web crawler but I want to add the recursion
I have a crawler that fetch documents across web pages. when i receive a
In a web-crawler of mine, I have a class that keeps track of urls
I have a crawler that downloads webpages, scrapes specific content and then stores that
I have a web-crawler where the basic layout is a manager that runs agents
I have a web crawler built in C# (I know) and it has grown

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.