I am using the following regex to get the src value of the first img tag in an HTML document.
string match = "src=(?:\"|\')?(?<imgSrc>[^>]*[^/].(?:jpg|png))(?:\"|\')?"
Now it captures total src attribute that I dont need. I just need the url inside the src attribute. How to do it?
Parse your HTML with something else. HTML is not regular and thus regular expressions aren’t at all suited to parsing it.
Use an HTML parser, or an XML parser if the HTML is strict. It’s a lot easier to get the src attribute’s value using XPath:
XML parsing is built into the
System.Xmlnamespace. It’s incredibly powerful. HTML parsing is a bit more difficult if the HTML isn’t strict, but there are lots of libraries around that will do it for you.