I could not think of a proper title. I have some data like –
$data = <<<EOD
<strong>
HHHHH
<strong>
TTTTT
<strong>
RRRRRRR
<strong>
EOD;
Basically above one is just an example. In real, the data is like –
<strong>Some Title</strong>
DATA
<strong>Some other Title</strong>
OTHER DATA
Sample: http://pastebin.com/cxzZWDZ8
Now I apply the following RegEx.
preg_match_all("%<strong>(.*?)<strong>%s", $data, $all);
This matches, HHHHH and RRRRRRR but I want to match TTTTT. How can I do this?
You could use a lookahead assertion to ensure the
<strong>is there, but isn’t part of the match (so it can be part of the next match):However, if what you’ve got is HTML, you should use an HTML parser to read it and not regex which is infamously poor at parsing HTML/XML markup. With
DOMDocument::loadHTML(),getElementsByNameand so on you’ll have a much more reliable way of scraping page data.