I need of a function in php that extract a description of a site

Question

0

Editorial Team

Asked: May 23, 20262026-05-23T01:01:41+00:00 2026-05-23T01:01:41+00:00

I need of a function in php that extract a description of a site

0

I need of a function in php that extract a description of a site url that don’t have meta tag description any idea?

i have tried this function but don’t work :

$content = file_get_contents($url);

function getExcerpt($content) {
  $text = html_entity_decode($content);
  $excerpt = array();
  //match all tags
  preg_match_all("|<[^>]+>(.*)]+>|", $text, $p, PREG_PATTERN_ORDER);
  for ($x = 0; $x < sizeof($p[0]); $x++) {
    if (preg_match('< p >i', $p[0][$x])) {
      $strip = strip_tags($p[0][$x]);
      if (preg_match("/\./", $strip))
        $excerpt[] = $strip;
    }
    if (isset($excerpt[0])){
      preg_match("/([^.]+.)/", $strip,$matches);
      return $matches[1];
    }
  }
  return false;
}

$excerpt = getExcerpt($content);

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T01:01:42+00:00

Parsing HTML with RegEx is almost always a bad idea. Thankfully PHP has libraries that can do the work for you. The following code uses DOMDocument to extract either the meta description or if one does not exist, the first 1000 characters in the page.

<?php
function getExcerpt($html) {

    $dom = new DOMDocument();

    // Parse the inputted HTML into a DOM
    $dom->loadHTML($html);

    $metaTags = $dom->getElementsByTagName('meta');

    // Check for a meta description and return it if it exists
    foreach ($metaTags as $metaTag) {
        if ($metaTag->getAttribute('name') === "description") {
            return $metaTag->getAttribute('content');
        }
    }

    // No meta description, extract an excerpt from the body
    // Get the body node
    $body = $dom->getElementsByTagName('body');
    $body = $body->item(0);

    // extract the contents
    $bodyText = $body->textContent;

    // collapse any line breaks
    $bodyText = preg_replace('/\s*\n\s*/', "\n", $bodyText);
    // collapse any more leftover spaces or tabs to single spaces
    $bodyText = preg_replace('/[    ]+/', ' ', $bodyText);

    // return the first 1000 chars
    return trim(substr($bodyText, 0, 1000));

}

$html = file_get_contents('test.html');

echo nl2br(getExcerpt($html));

You’ll probably want to add a little more logic to it, some DOM traversal to try to find the content, or just some snippet near the middle of the text. As it is, this code will probably grab a bunch of unwanted stuff like the top of the page navigation etc.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need of a function in php that extract a description of a site

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply