I ahve been diggin the net for some time now, not finding code examples that help me through my problem.. I have looked at example code but I’m still not “getting” it…
I have read up on,
http://msdn.microsoft.com/en-us/library/aa480507.aspx and
http://msdn.microsoft.com/en-us/library/dd781401.aspx
But I cant seem to get it to work..
Im using HTMLAGILITYPACK
Today I make up to 20 webrequests,
After a request has finished, result is added to dictionary, after that a method searches it for the information, if found the code exits if not it makes another webrequest , until it caps at 20. I need to be able to exit all threads async calls when everything is found.
It goes like this
public void FetchAndParseAllPages()
{
PageFetcher fetcher = new PageFetcher();
for (int i = 0; i < _maxSearchDepth; i += _searchIncrement)
{
string keywordNsearch = _keyword + i;
ParseHtmldocuments(fetcher.GetWebpage(keywordNsearch));
//this checks if the information was found or not, if
//found stop exit and add to database
if (GetPostion() != 201)
{ //ADD DATA TO DATABASE
InsertRankingData(DocParser.GetSearchResults(), _theSearchedKeyword);
return;
}
}
}
This is inside the class that fetches the page
public HtmlDocument GetWebpage(string urlToParse)
{
System.Net.ServicePointManager.Expect100Continue = false;
HtmlWeb htmlweb = new HtmlWeb();
htmlweb.PreRequest = new HtmlAgilityPack.HtmlWeb.PreRequestHandler(OnPreRequest);
HtmlDocument htmldoc = htmlweb.Load(@"urlToParse", "38.69.197.71", 45623, "PORXYUSER", "PROXYPASSWORD");
return htmldoc;
}
public bool OnPreRequest(HttpWebRequest request)
{
// request.UserAgent = RandomUseragent();
request.KeepAlive = false;
request.Timeout = 100000;
request.ReadWriteTimeout = 1000000;
request.ProtocolVersion = HttpVersion.Version10;
return true; // ok, go on
}
How can I make this async and make it really quick with threads? Or should i even use threads when doing it async?
Okay I solved it! At least I think so! Execution time went down to around seven seconds. It took me about 30 secs to do that without async.
Here my code for future reference. EDIT I used a console project to test the code. Also I’m using html agilitypack. This is my way of doing it, any tips on how to further optimize this would be cool to see.