I’m trying to write a RegEx which finds all links on a webpage with the rel=”nofollow” attribute. Mind you, I’m a RegEx newb so please don’t be to harsh on me 🙂
This is what I got so far:
$link = "/<a href=\"([^\"]*)\" rel=\"nofollow\">(.*)<\/a>/iU";
Obviously this is very flawed. Any link with any other attribute or styled a little differently (single quotes) won’t be matched.
You should really use DOM parser for this purpose as any regex based solution will be error prone for this kind of HTML parsing. Consider code like this: