I have WPF app that processes a lot of urls (thousands), each it sends off to it’s own thread, does some processing and stores a result in the database.
The urls can be anything, but some seem to be massively big pages, this seems to shoot the memory usage up a lot and make performance really bad. I set a timeout on the web request, so if it took longer than say 20 seconds it doesn’t bother with that url, but it seems to not make much difference.
Here’s the code section:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(urlAddress.Address);
req.Timeout = 20000;
req.ReadWriteTimeout = 20000;
req.Method = "GET";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
pageSource = reader.ReadToEnd();
req = null;
}
It also seems to stall/ramp up memory on reader.ReadToEnd();
I would have thought having a cut off of 20 seconds would help, is there a better method? I assume there’s not much advantage to using asynch web method as each url download is on its own thread anyway..
Thanks
In general, it’s recommended that you use asynchronous HttpWebRequests instead of creating your own threads. The article I’ve linked above also includes some benchmarking results.
I don’t know what you’re doing with the page source after you read the stream to end, but using string can be an issue:
Some other suggestions:
Streaminstead of passing in a string containing the page source (if that’s an option).Additionally, can you tell us what’s your initial rate of fetching pages and what does it go down to? Are you seeing any errors/exceptions from the web request as you’re fetching pages?
Update
In the comment section I noticed that you’re creating thousands of threads and I would say that you don’t need to do that. Start with a small number of threads and keep increasing them until you peek the performance on your system. Once you start adding threads and the performance looks like it’s tapered off, then sop adding threads. I can’t imagine that you will need more than 128 threads (even that seems high). Create a fixed number of threads, e.g. 64, let each thread take a URL from your queue, fetch the page, process it and then go back to getting pages from the queue again.