I’ve to parse over a piece of HTML.
It looks a bit like:
<table>
<tr>
<td class="blabla"> <table><tr><td><table><tr><td></td></tr></table></td></tr></table>
</td>
</tr>
<tr>
<td class="blabla"> <table><tr><td></td></tr></table>
</td>
</tr>
</table>
I need to extract each td with class blabla, but each of these cells could have 0 or more nested tables with many nested td’s. I want to get
<td class="blabla"> ... many nested stuff ... </td>
Thanks
Don’t try to parse HTML with regular expressions. You can’t write an expression that will match what you want, because HTML isn’t regular.
Use an HTML/XML parser in a library your language provides.
System.Xmlhas a number of useful classes that will let you open your file and query it with XPath.The XPath expression you’re looking for is