I’m trying to pull meta tags out of a html page, to compare two

Question

0

Asked: May 23, 20262026-05-23T18:55:55+00:00 2026-05-23T18:55:55+00:00

I’m trying to pull meta tags out of a html page, to compare two

0

I’m trying to pull meta tags out of a html page, to compare two pages (live and dev) to see if they’re SEO is the same after a site redesign/refactor. I need to compare title, meta tags (description, opengraph etc.), h1’s, our analytics (Omniture), and our ad tags (doubleclick) are all the same.

My problem is getting meta tags
http://php.net/manual/en/function.get-meta-tags.php
only works if they have a name= attribute, same with “mariano at cricava dot com”‘s solution.

I don’t want to restrict it to having certain attributes, I could make the assumption that all our meta tags have either a name=, or property= or http-equiv= and change the regex appropriately but cannot be entirely sure as it’s a massive website and any random crap could be in the tags (hence this tool is to check this stuff!) and would like to leave it as dynamic as possible.

I have

$page = @file_get_contents('http://.../');
preg_match_all('#<meta(?:\s+?([^\=]+)\=\"(.+?)\")+?\s*?/?>#sui', $page, $matches, PREG_SET_ORDER)

but the subpatterns override each other, so this only pulls out the last attribute-name=attribute-value pair

Array
(
    [0] => Array
        (
            [0] => <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
            [1] => content
            [2] => text/html; charset=UTF-8
        )

    [1] => Array
        (
            [0] => <meta name="description" content="some description" />
            [1] => content
            [2] => some description
        )

    [2] => Array
        (
            [0] => <meta property="og:type" content="website" />
            [1] => content
            [2] => website
        )
...

I need all the attributes for all the meta tags. I could do this in two steps, pulling the contents of <meta ([^>]*)> then doing a second regular expression on the results, but that seems unnecessary with the power of regex?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T18:55:56+00:00

Editorial Team

2026-05-23T18:55:56+00:00Added an answer on May 23, 2026 at 6:55 pm

But back to the original question, forget it’s HTML for now, is there
no way to have recurring subpatterns return in preg_match_all rather
than just returning the last match?

Not possible with preg_*/PCRE (nor any other regex flavor that I know of, but in Perl you could use a (?{ push @list, $^N }) hack).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to pull meta tags out of a html page, to compare two

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply