I am working on a quick tool that grabs images off of a web page. Currently, I use a WebClient to get the source data of the page, parse the image URLs out of the source, and download them individually to a folder using WebClient.DownloadFile. This can take quite a while.
I understand that most of the time needed is due to my connection and downloading the data.
Are there any other, more efficient ways of going about this, whether it be a C# HTML parsing library or something else?
I am working on a quick tool that grabs images off of a web
Share
You can use multiple threads, which will open multiple concurrent HTTP connections to the web page.
One good approach would be to implement a Producer/Consumer pattern: have one thread that gets and parses the HTML containing the images, then queues the image URLs into something like a BlockingCollection. Have multiple threads read the image URLs from the queue and download the images concurrently.
http://msdn.microsoft.com/en-us/library/dd997371.aspx
If you’re up for cutting edge, this class if problem is ideally suited for TPL Dataflow (alternative to a BlockingCollection).
http://msdn.microsoft.com/en-us/devlabs/gg585582.aspx