Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7789705
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T21:22:49+00:00 2026-06-01T21:22:49+00:00

I have been building a scraper and spider in php when I hit this

  • 0

I have been building a scraper and spider in php when I hit this design question. I was wondering about the trade offs between making a system which separates the crawling and scraping tasks (as most professional systems seem to do) and one that scrapes as the spider crawls. The only thing I could think of is that by splitting it up and using a queue, you could better parallelize the task by having several scrapers that just need to ask the queue what is the next page to scrape. Can anyone think of other trade offs and explain to me the main reason that these are normally separated into two programs?

Note: the order of the crawling is the same in both cases, the only difference is when the page gets pulled.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T21:22:52+00:00Added an answer on June 1, 2026 at 9:22 pm

    A crawler retrieves pages, and a spider processes them. If you keep these tasks separate you can change the implementation of one task without changing the other. This is why they are separated: it is simply good software design.

    The example you give is a good one: if you combine retrieval with processing in a single class/module/program/function/whatever, any change in how pages are retrieved (e.g., parallel retrieval, retrieval through a proxy, etc) requires rewriting the entire program.

    Here’s another one: if you want to process a different kind of data (e.g. rss feeds instead of html pages), you need to write your entire scraper from scratch and you cannot reuse any work you did on page retrieval.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have been building enterprise software for the last 10 years. In this time
I have been using PHP and JavaScript for building my dad's website. He wants
I have been building a UIPickerView with 2 components(wheels) and was wondering if there
I have been building this site and I am getting close to finishing. So
Here's a very broad question: I've been building a small MVC framework in PHP,
This morning I tried running a Silverlight 5 App that we have been building
Okay, I have been building using MonoTouch for about 6 hours and I have
I have been building this app for a while now and have not had
I have been building IMO a really cool RIA. But its now close to
I have been building a new application using my current understanding of domain driven

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.