I have a PHP scraper script which I use to scrape a page on my site. The script then parses the content into HTML and outputs it for the user. I came across using the useragent function in PHP to pretend that you are a crawler, for example GoogleBot. How can I combine my two scripts together so the page I am scraping thinks I am a crawler?
My scraper PHP code is:
$query=$_REQUEST['q'];
$html = file_get_contents("search.php?q=$query");
preg_match_all(
'/<div class="cl1 cld">.*?<a rel="nofollow" class="l le" href="(.*?)">(.*?)<\/a>.*?<div class="cra">(.*?)<\/div>.*?<div class="clud">(.*?)<\/div>.*?<\/div>/s',
$html,
$posts, // will contain the blog posts
PREG_SET_ORDER // formats data into an array of posts
);
foreach ($posts as $post) {
$link = $post[1];
$title = $post[2];
$description = $post[3];
$url = $post[4];
echo "<div class='result'><div class='title'><a href='$link'>$title</a></div>$description<div class='url'>$url</div></div>";
}
?>
I have this line of code which pretends to be a crawler.
$userAgent = 'MyScraperBot (http://www.mysite.com/)';
If you want to keep using
file_get_contentsyou can set PHPs internal (http fopen wrapper) user-agent with: