I have this script.
I need is so that it only writes the links that contain "/product-product/" to the file items.txt. Well, not the wohle link but the 10 didget item-nr
product-product/1007687980
In the example you see the item-nr is something with /100. I was searching for items in a category where the nrs are something with /100. But that is no longer needed.
$keyword= $_SERVER['QUERY_STRING'];
$site=1;
while ($site<30) {
$content = file_get_contents('http://www.example.com/?keywords='. $keyword .'&x=0&y=0&pagecount='.$site.'&sort=sort');
$html = $content;
$dom = new DomDocument();
@$dom->loadHTML($html);
$urls = $dom->getElementsByTagName('a');
$lookfor='http://www.example.com';
foreach ($urls as $url){
if(substr($url->getAttribute('href'),0,strlen($lookfor))==$lookfor){
$tubeurl = str_replace ("http://www.example.com","",$url->getAttribute('href'));
$tubeurl = substr($tubeurl, strpos($tubeurl,"/product-product/100")+17, 10);
file_put_contents("items.txt", "" .$tubeurl. "
", FILE_APPEND | LOCK_EX);// this line must remain, it makes it so that there is a new line \n wouldn't work
}
} $site++; echo $site;}
regex would be a solution. but I read here on Stackoverflow that is a lot of work for the server.
A simple regular expression that puts your product ID into $1 should do the trick. You probably want some more logic to make sure that $1. Modified it so that $1 should always be 10 digits.