I’m using regular expressions to extract data from a page controlled by another group in my organization. The basic structure follows the same pattern:
<td><strong>Text I'm looking for</strong>...<a href="Link I'm also looking for"></a></td>
I am able to successfully grab the desired data with
<td><strong>(?<title>.*?)</strong>(.*?)<a href="(?<link>.*?)">(.*?)</a></td>
However I occasionally run into a group that looks like
<td><strong>Text I'm </strong><strong>looking for</strong>...<a href="Link I'm also looking for"></a></td>
Is there a regular expression to handle this? It would preferably combine the two blocks automatically but I could combine them manually if necessary.
Using regular expressions to parse HTML is difficult and not safe. There is a .NET library that can help you with this:
Html Agility Pack( http://htmlagilitypack.codeplex.com/)(it supports
XPATHandXSLT)