I’m trying to find a way to make a list of everything between <a> and </a> tags. So I have a list of links and I want to get the names of the links (not where the links go, but what they’re called on the page). Would be really helpful to me.
Currently I have this:
$lines = preg_split('/\r?\n|\r/', $content); // content is the given page foreach ($lines as $val) { if (preg_match('/(<A(.*)>)(<\/A>)/', $val, $alink)) { $newurl = $alink[1]; // put in array of found links $links[$index] = $newurl; $index++; $is_href = true; } }
The standard disclaimer applies: Parsing HTML with regular expressions is not ideal. Success depends on the well-formedness of the input on a character-by-character level. If you cannot guarantee this, the regex will fail to do the Right Thing at some point.
Having said that: