I am trying to understand the best method to speed up a little program that searches for strings in the source of multiple websites. The program as it stands is as follows:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim urls() As String = TextBox1.Lines()
Dim stringstofind() As String = TextBox2.Lines()
For Each url As String In urls
CheckForStrings(url, stringstofind)
Next
End Sub
Private Sub CheckForStrings(ByVal url As String, ByVal stringstofind() As String)
Dim wc As New WebClient()
Dim source As String = wc.DownloadString(url)
'MessageBox.Show(source)
For Each stringtofind As String In stringstofind
If (source.IndexOf(stringtofind) <> -1) Then
TextBox3.AppendText("url: " + url + " string: " + stringtofind + vbCrLf)
Exit For
Else
TextBox3.AppendText("url: " + url + " string: " + "NOT FOUND" + vbCrLf)
End If
Next
End Sub
The options that seem available are:
Thread the initial for each loop using a parallel.for each. Apart from a few edits to avoid cross threading issues and blocking the GUI, it seems pretty simple to do, but doesnt seem like the best way to do it.
Use the webclient.DownloadStringAsync method.
This is the 1st thing i looked at, but i cant work out how to pass back the resulting string from the DownloadStringCompleted event.
Also, if i can work this out, how do you limit how many simultainious requests are made, to avoid overloading the network connection?
I also looked at some c# examples using .net4.5 that look great, but the program would need to run on server2003 so i guess thats out of the question
Any help greatly appreciated.
I will post the comment as an answer as this is not getting much traffic.
TPL allocates threads based on CPU it is not going to deal with slow connections that don’t put a load on CPU well.
An easy start is WithDegreeOfParallelism to throttle. enter link description here
You may find 100 is a good WithDegreeOfParallelism.
The next level of optimization gets a lot more complex. If it is async, thread pool, or combination. It is going to depend so much on the latency of the websites. And I am not sure you would buy a lot with async as an idle thread is not that much overhead.