I am trying to parse a HTTP document to extract portions of the document, but am unable to get the desired results. Here is what I have got:
<?php
// a sample of HTTP document that I am trying to parse
$http_response = <<<'EOT'
<dl><dt>Server Version: Apache</dt>
<dt>Server Built: Apr 4 2010 17:19:54
</dt></dl><hr /><dl>
<dt>Current Time: Wednesday, 10-Oct-2012 06:14:05 MST</dt>
</dl>
I do not need anything below this, including this line itself
......
EOT;
echo $http_response;
echo '********************';
$count = -1;
$a = preg_replace("/(Server Version)([\s\S]*?)(MST)/", "$1$2$3", $http_response, -1, $count);
echo "<br> count: $count" . '<br>';
echo $a;
- I still see the string “I do not need …” in the output. I do not need that string. What am I doing wrong?
- How do I easily remove all other HTML tags as well?
Thanks for your help.
-Amit
You are matching everything from
Server VersionuntilMST. And only the part that is matched will later be modified by preg_replace. Everything not covered by the regex remains untouched.So to replace the string part before your first anchor, and the text following, you also must match them first.
See the
^.*and.*$. Both will be matched, but aren’t mentioned in the replacement pattern; so they get dropped.Also of course, might be simpler to just use
preg_match()in such cases …