I’m a relative beginner to PHP and I’m having an issue with a web scraper script that I’m trying to develop. The script is designed to grab a VBulletin forum page, then parse through the hyperlinks on the page to find the ones that include a particular “id” element (i.e. only the links that point to message threads posted on the forum). Each of the desired links includes an “id” element that begins “thread_title_[Thread # here].” I came up with the idea of using STRPOS as a filter to examine each “id” element from the collected links and check if they contain the fragment “thread_title”. Unfortunately, my efforts don’t seem to be bearing fruit.
I will paste the code excerpt below… at the risk of being labeled a complete noobie. 😉 Hopefully I’m not doing something terribly stupid. Thanks for the help
$d = new domdocument();
$d->loadHTMLfile("forum3.html");
$links = $d->getElementsByTagName('a');
echo '<html xmlns="http://www.w3.org/1999/xhtml" encoding="utf-8" lang="ar-sa">';
foreach ($links as $link)
{
$threadTitleExists = $link->getAttribute('id');
$pos = strpos($threadTitleExists, 'thread_title');
$threadTitle = $link->nodeValue;
if ($link->hasAttribute('id') && ($pos==0))
{
$threadTitle = trim(preg_replace('#/\s*\([^)]*\)/', ' ', $threadTitle));
echo "Thread number: " . $threadTitleExists . "<br>Thread title: " . $threadTitle . "<p>";
}
else
{
continue;
}
}
change the line
to
strpos() returns 0 if the haystack begins with the needle, which evaluates as false when using the loosely typed comparison operator. There’s a warning on the manual page (linked) to use the
===operator instead.