If I am creating a simple web scraper (from root url, grab all links, then from those links grab all emails) would it be worthwhile to use HTML Agility Pack? I am not actually looking through HTML tags, I am simply looking to scan for emails within the entire document.
Would it be more efficient to use HTML agility pack?
I am stripping them strictly because it is necessary I have these emails, and there are about 100 links. Only about 500 emails will be scraped. No worries, I’m keeping ethics in mind here.
There are many question on SO about this – most of the ones I read say – don’t use regular expressions for web scraping.
On the other hand – if all you want is text parsing regardless of the HTML nature of the text (which you do if I understand you correctly), it may be better to use regular expressions.