I have to scrape a table from a secure site and I’m having trouble logging in to the page and retrieving the authentication token and any other associated cookies. Am I doing something wrong here?
public NameValueCollection LoginToDatrose()
{
var loginUriBuilder = new UriBuilder();
loginUriBuilder.Host = DatroseHostName;
loginUriBuilder.Path = BuildURIPath(DatroseBasePath, LOGIN_PAGE);
loginUriBuilder.Scheme = "https";
var boundary = Guid.NewGuid().ToString();
var postData = new NameValueCollection();
postData.Add("LoginName", DatroseUserName);
postData.Add("Password", DatrosePassword);
var data = Encoding.ASCII.GetBytes(postData.ToQueryString(false));
var request = WebRequest.Create(loginUriBuilder.Uri) as HttpWebRequest;
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;
using (var d = request.GetRequestStream())
{
d.Write(data, 0, data.Length);
}
var response = request.GetResponse() as HttpWebResponse;
var responseCookies = new NameValueCollection();
foreach (var nvp in response.Cookies.OfType<Cookie>())
{
responseCookies.Add(nvp.Name, nvp.Value);
}
//using (var responseData = response.GetResponseStream())
//using (var responseReader = new StreamReader(responseData))
//{
// var theResponse = responseReader.ReadToEnd();
// Debug.WriteLine(theResponse);
//}
return responseCookies;
}
I get no values in the return object. It does not fail. The value of theResponse (when not commented out) seems to be the HTML of the login page.
Any assistance would be greatly appreciated.
OK, the problem here seems related to the 302 redirect that would occur after the credentials were passed. The
HttpWebRequestwould automatically follow the 302.Ultimately, I ended up doing things a little differently. First, I subclassed the
WebClientclass as follows:This allowed me to use a
WebClientclass that was cookies-aware as well as one that I could control the redirect. Then I rewrote my code for logging in as follows:…and everything worked swimmingly.