I’m writing a .NET program in C# that makes GET requests and downloads pages to parse – a sort of crawler. I noticed that it has to read from the stream multiple times to download each page because each page is so large.
Currently I’ve set my stream buffer size to 5024 bytes. My question is would it be more efficient to increase this size and therefore perform less stream reads? Or is it better to process less data at a time from which to parse?
Basically worded differently – is it quicker to parse more data at once and have to call stream.read less often, or the other way around?
Thanks!
While, generally, increasing the size of the buffer and fitting more data in at a time would increase the speed of the operation, the performance increase is going to be minimal at best. I think instead what you want to try is an asynchronous request. Something like
this. This allows the application to employ the thread pool to read from the socket or multiple simultaneously and then work on the stream only when there is something to be worked on. This frees up the application to do other things as the data is being pulled into the buffer.