Below is the code I am using.
It reads links from a textarea, and then gets the source code and finally filters the meta tags. However it only displays the last element in the array.
So if for example I put 3 websites into the textarea, it will only read the last one, the others are just shown as blank.
Have spent hours trying this, please help.
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
if(isset($_POST['url'])){
$url = $_POST['url'];
$url = explode("\n",$url);
print_r($url);
for($counter = 0; $counter < count($url); $counter++){
$html = file_get_contents_curl($url[$counter]); // PASSING LAST VALUE OF ARRAY
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++){
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
if($meta->getAttribute('name') == 'keywords')
$keywords = $meta->getAttribute('content');
}
print
('
<fieldset>
<table>
<legend><b>URL: </b>'.$url[$counter].'</legend>
<tr>
<td><b>Title:</b></td><td>'.$title.'</td>
</tr>
<tr>
<td><b>Description:</b></td><td>'.$description.'</td>
</tr>
<tr>
<td><b>Keywords:</b></td><td>'.$keywords.'</td>
</tr>
</table>
</fieldset><br />
');
}
}
This was an annoying little bug to find – but here is the (ridiculously simple) solution:
Your URLs are getting white space added to them, for all but the last URL therefore you’ll need to trim it, you can do the following:
If available, you could have possibly just used
file_get_contents()(still requires you trimming the URL).The second problem is that if there is no
metadata then the old variables are used (from the previous loop) so just before the end of your main loop, after yourprint()add the following: