I have a crawler which scrapes a website for information and then inserts the values into a database, it seems to insert the first 4000~ rows fine but then suddenly stops inserting values to the mysql database even though the crawler is still scraping the website
Database table
CREATE TABLE IF NOT EXISTS `catalog` (
`id` varchar(100) NOT NULL DEFAULT '',
`title` varchar(100) DEFAULT NULL,
`value` double DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
PHP insert function
function addToCatalog($id, $title, $value){
$q = "INSERT INTO catalog VALUES('$id', '$title', $value)";
return mysql_query($q, $this->connection);
}
php scrape function
function scrape($pageNumber){
$page = file_get_html('http://example.com/p='.$pageNumber);
if($page){
$id = array();
$title = array();
$value = array();
//id
if($page->find('.productid')){
foreach ($page->find('.productid') as $p) {
$id[] = $p->innertext;
}
}
//title
if($page->find('.title')){
foreach($page->find('.title') as $p){
$title[] = $p->innertext;
}
}
//value
if($page->find('.value')){
foreach($page->find('.value') as $p){
$value[] = $p->innertext;
}
}
for($i=0; $i<sizeof($id); $i++){
$add = $database->addToCatalog($id[$i], $title[$i], $value[$i]);
echo $id[$i]." ".$title[$i]." ".$value[$i]."<br>";
}
}
}
for($i=0; $i<31300; $i++){
scrape($i);
}
Any help with this problem would be appreciated.
If the execution of the process stops after about 30 seconds, your problem is probably the
max_execution_timesetting.