I’m developing an app that scrapes a website for Google+1’s, Facebook shares and Tweets. I have a request method that accepts a URL and then goes off and fetches the count for each social media type.
It’s logic is as follows: –
- Take URL
- Do requests through local/default IP until we get rate limited/non-500 response
- On error
- Call
SelectNewProxy()which iterates over a list of proxies and returns one at random to try (a nice way of avoiding request limits for your IP) - Remove bad proxy from list to avoid selecting it again
- Start a timer which increments every second
- Call
- When the timer == 600 (10 minutes)
- Create new
WebProxyand try to the requests behind our local/default IP again - Reset timer
- Create new
Rinse and repeat
The code is as follows:
public string Request(string action)
{
HttpWebRequest req;
OnStatusChange(new MyArgs() { Message = "Status: Requesting..." });
string response = string.Empty;
while (response.Equals(string.Empty) && proxy != null)
{
try
{
req = (HttpWebRequest)WebRequest.Create(action);
req.Proxy = proxy;
HandleUIMessages(action, proxy);
response = new StreamReader(req.GetResponse().GetResponseStream()).ReadToEnd();
}
catch
{
//OnProxyChange(new MyArgs() { ProxyMessage = string.Format("Proxy: {0}", proxy.Address.ToString()) });
RemoveProxy(proxy);
if (!timer.Enabled)
{
timer.Interval = (int)TimeInterval.OneSecond;
timer.Elapsed += new System.Timers.ElapsedEventHandler(timer_Elapsed);
timer.Enabled = true;
timer.Start();
}
WebProxy reset = new SelectNewProxy();
proxy = counter >= 600 ? reset : proxy = SelectNewProxy();
}
}
return response;
}
It’s worth mentioning that I’m using ThreadPool and each request is running in it’s own thread. It seems like it would work but I don’t get the desired effect, the counter reaches ‘600’ and sets proxy = reset but it appears to only do it very briefly, possibly only for the first thread that hits it? Then timer_Elapsed is called and counter is reset. Could it be that a thread is hitting it, assigning proxy = reset and then because counter has now been reset (no longer >= 600), all subsequent queued up threads call SelectNewProxy()? Feel like I’m rambling but hopefully someone can make sense of what I’m trying to say and if I’m right in my guess, how can I ensure that all threads get proxy = reset and retry our inital IP?
Any help is much appreciated!
Thankyou
How have you declared
proxy? If you are reading/writing its value on multiple threads, you should make sure you declare it with thevolatilekeyword, otherwise writes toproxyon one thread may not be observed by others.e.g.: