I am trying to capture all attributes of hyperlinks in HTML using PHP but my regex is only returning the last attribute and value.
HTML:
$string = '
<a href="http://www.example.com/" style="font-weight: bold;">Example</a>
<a href="http://www.exampletwo.com/ style="font-weight: bold;">Example Two</a>
';
Regex:
preg_match_all('/<a(?: (.*?)="(.*?)")*>(.*?)<\/a>/i', $string, $result);
Result:
Array
(
[0] => Array
(
[0] => <a href="http://www.example.com/" style="font-weight: bold;">Example</a>
[1] => <a href="http://www.exampletwo.com/" style="font-weight: bold;">Example Two</a>
)
[1] => Array
(
[0] => style
[1] => style
)
[2] => Array
(
[0] => font-weight: bold;
[1] => font-weight: bold;
)
[3] => Array
(
[0] => Example
[1] => Example Two
)
)
How can I get it return all of the results from the repeating pattern?
If I may present an alternative to the oft-reviled ‘regex HTML parsing’:
use DOMDocument to parse your HTML and then simply tell it to give you all the anchor tags. If you suspect you’ll be dealing with massive HTML input, however, there’s always
XMLReader, although you’ll have problems with non-proper or non-XHTML input with that.