I am parsing through HTML using Linq-to-sql. Right now to get a specific paragraph tag I’m using the following code:
var paragraphs = contentDiv.Parent.Parent.Parent.Parent.Parent.Elements("p").ToList();
However, one of the sites I am parsing has P tags with tags after them. So the markup is like:
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
<p>...</p>
<ul><li>...</li></ul>
I need to get all the text inside all p tags and inside all ul tags but I need the content in the order that it appears in the HTML. Essentially I’d like something similar to:
var paragraphs = contentDiv.Parent.Parent.Parent.Parent.Parent.Elements("p" || "ul").ToList();
How would I go about doing this?
And no, these P and UL tags are not sectioned off by themselves, so I can’t just get all content in that parent XElement.
Sounds like you want