I’m trying to scrape only article text from web pages. I have discovered that the article is always surrounded with div tags. Unfortunately the class of these div tags is slightly different for each web page. I looked into using XPath but I don’t think it will work due to the different class names. Is there a way I can get all the div tags and then get the class?
Examples
<div class="entry_single">
<p>I recently traveled without my notebook for the first time in ages.</p>
</div>
<div class="entry-content-pagination">
<p>Ward 9 Ald. Steven Dove</p>
</div>
That’d be easier using Linq.