Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9066591
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T16:45:41+00:00 2026-06-16T16:45:41+00:00

I’m trying to write my first crawler by using PHP with cURL library. My

  • 0

I’m trying to write my first crawler by using PHP with cURL library. My aim is to fetch data from one site systematically, which means that the code doesn’t follow all hyperlinks on the given site but only specific links.

Logic of my code is to go to the main page and get links for several categories and store those in an array. Once it’s done the crawler goes to those category sites on the page and looks if the category has more than one pages. If so, it stores subpages also in another array. Finally I merge the arrays to get all the links for sites that needs to be crawled and start to fetch required data.

I call the below function to start a cURL session and fetch data to a variable, which I pass to a DOM object later and parse it with Xpath. I store cURL total_time and http_code in a log file.

The problem is that the crawler runs for 5-6 minutes then stops and doesn’t fetch all required links for sub-pages. I print content of arrays to check result. I can’t see any http error in my log, all sites give a http 200 status code. I can’t see any PHP related error even if I turn on PHP debug on my localhost.

I assume that the site blocks my crawler after few minutes because of too many requests but I’m not sure. Is there any way to get a more detailed debug? Do you think that PHP is adequate for this type of activity because I wan’t to use the same mechanism to fetch content from more than 100 other sites later on?

My cURL code is as follows:

function get_url($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
    curl_setopt($ch, CURLOPT_URL, $url);
    $data = curl_exec($ch);
    $info = curl_getinfo($ch);  
    $logfile = fopen("crawler.log","a");
    echo fwrite($logfile,'Page ' . $info['url'] . ' fetched in ' . $info['total_time'] . ' seconds. Http status code: ' . $info['http_code'] . "\n");
    fclose($logfile);
    curl_close($ch);

    return $data;
}

// Start to crawle main page.

$site2crawl = 'http://www.site.com/';

$dom = new DOMDocument();
@$dom->loadHTML(get_url($site2crawl));
$xpath = new DomXpath($dom);
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T16:45:42+00:00Added an answer on June 16, 2026 at 4:45 pm

    Use set_time_limit to extend the amount of time your script can run for. That is why you are getting Fatal error: Maximum execution time of 30 seconds exceeded in your error log.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to find ID3V2 tags from MP3 file using jid3lib in Java.
I'm making a simple page using Google Maps API 3. My first. One marker
I am using jsonparser to parse data and images obtained from json response. When
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
We're building an app, our first using Rails 3, and we're having to build
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I'm trying to create an if statement in PHP that prevents a single post
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I am reading a book about Javascript and jQuery and using one of the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.