What’s faster? I just made a web scraper that uses HTML Agility pack and it’s consuming massive amounts of memory.
Profiling it with a memory profiler, I found that the HTMLDocument, HTMLNode, etc, instances are taking up the most amount of memory.
I feel like maybe it would be faster and more efficient to use regex, am I wrong?
A reg-ex will be a lot faster than html agilty pack.
But you should remember that html need not always be well formed. Searching the correct data you want using only reg-ex may fail. Browsers are very forgiving about mistakes.
Agility pack is a great tool. It provides a lot of features for that memory it is consuming.