Is there a way to gather all links that has a specific domain in a string where they only include ones that are either:
href="http://yahoo.com/media/news.html"
or
>http://yahoo.com/media/news.html<
So basically links either prefixed by href=" and ends with "
or
links that are surrounded by ><.
I tried to use Regex ( "href=\"([^\"]*)\"></A>" ) but didn’t match anything.
Try the following:
EDIT: another approach is to use look-arounds so that the text is matched but not captured. This allows you to use
Match.Valuedirectly instead of using groups. Try this alternate approach below.EDIT #2: per the request in the comments here is a pattern that will not match URLs that contain “…” in the text.
The only change is the addition of
(?!.*\.{3})which is a negative look-ahead that allows the pattern to match if the specified suffix is absent. In this case it checks that the “…” is absent. If you need to match at least 3 dots then use{3,}.