We have a website that lists links to blogs in realtime. The problem is that the pages are slow to load because they are reading data from the various source sites.
I wrote a PHP script that creates an HTML version of each page. This runs once each hour. The problem is that the PHP script is timing out before it finishes all the pages. I know that I could increase the execute time allowed for PHP scripts, but this does not seem like the most efficient way to handle the issue.
Is there another way to do this? I just don’t know what to begin looking for – PERL? JAVA? Python? How do these scripts run on a server? What should I look for from my web host?
A different solution might be to use a database, and not bite off so much work at once. Make a table listing the sites you pull, and store when they were last pulled. Then have the cron pull out 1 or 2 that haven’t been pulled in a while. Have it run often, then you’ll always have fresh data, but the script will have an easier time working as its not trying to do so much at once. This concept will scale well.