I need of a function in php that extract a description of a site url that don’t have meta tag description any idea?
i have tried this function but don’t work :
$content = file_get_contents($url);
function getExcerpt($content) {
$text = html_entity_decode($content);
$excerpt = array();
//match all tags
preg_match_all("|<[^>]+>(.*)]+>|", $text, $p, PREG_PATTERN_ORDER);
for ($x = 0; $x < sizeof($p[0]); $x++) {
if (preg_match('< p >i', $p[0][$x])) {
$strip = strip_tags($p[0][$x]);
if (preg_match("/\./", $strip))
$excerpt[] = $strip;
}
if (isset($excerpt[0])){
preg_match("/([^.]+.)/", $strip,$matches);
return $matches[1];
}
}
return false;
}
$excerpt = getExcerpt($content);
Parsing HTML with RegEx is almost always a bad idea. Thankfully PHP has libraries that can do the work for you. The following code uses DOMDocument to extract either the meta description or if one does not exist, the first 1000 characters in the page.
You’ll probably want to add a little more logic to it, some DOM traversal to try to find the content, or just some snippet near the middle of the text. As it is, this code will probably grab a bunch of unwanted stuff like the top of the page navigation etc.