Trying to find the links on a page.
my regex is:
/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/
but seems to fail at
<a title="this" href="that">what?</a>
How would I change my regex to deal with href not placed first in the a tag?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Reliable Regex for HTML are difficult. Here is how to do it with DOM:
The above would find and output the “outerHTML” of all
Aelements in the$htmlstring.To get all the text values of the node, you do
To check if the
hrefattribute exists you can doTo get the
hrefattribute you’d doTo change the
hrefattribute you’d doTo remove the
hrefattribute you’d doYou can also query for the
hrefattribute directly with XPathAlso see:
On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here