Let’s say I’ve createad a web scraping PHP page ( getdata.php ) that gets

Question

0

Asked: June 5, 20262026-06-05T07:31:05+00:00 2026-06-05T07:31:05+00:00

Let’s say I’ve createad a web scraping PHP page ( getdata.php ) that gets

0

Let’s say I’ve createad a web scraping PHP page (getdata.php) that gets content of a specific website pages by cUrl, than saves some useful info to a txt file or database.

pseudo code of getdata.php,

min = get latest search id from database
max = 1.000.000 (yes one million different pages)

while (min < max) {

  url = "http://www.website.com/page.php?id=".$min
  content = getContentFromURL(url)
  saveUsefulInfoToDb(content)
  min++
  set latest search id as min in database
}

It’s OK, the proccess is,

Open getdata.php on browser
Wait
Still wait, because there is about one million pages will be scraped.
Wait
And finally request time out.
Fail

So the problem is I don’t know how can I make this proccess reasonable. opening page on a browser and waiting for it to finish scraping URLs, I think It’s a really bad practice.

How can I make getdata.php runnable in background like cron?

What is the best way to do it?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T07:31:06+00:00

use in the top of the code

set_time_limit(0);
ignore_user_abort(true);

Then use a cron to fire it up each day or whenever it needs to. You definitely want this to be a background process and not a web page. Those two lines will allow it to run indefinitely as a web page or cmd line script. If you want to make it as a web page you can still use the cron to ‘fire’ it off with a line like

0 0 * * * /usr/bin/curl "http://yoursite.com/getdata.php" >> "/var/www/errors.log"

a bit of advice since I have done this many times: definitely make a logging function to print to a file so that you can see what it is doing as it runs or you will have no visibility and program into the php file a kill switch so you can tell it to stop running without having to use unix top or restart apache. It is probably a good idea to hard code in a kill time that it will stop if after a certain hour lest it run longer than a day and a second instance starts up and you have several running at once.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let’s say I’ve createad a web scraping PHP page ( getdata.php ) that gets

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply