I would like to grab some content from a website that is made with Drupal.
The challenge here is that i need to login on this site before i can access the page i want to scrape. Is there a way to automate this login process in my C# code, so i can grab the secure content?
I would like to grab some content from a website that is made with
Share
To access the secured content, you’ll need to store and send cookies with every request to your server, starting with the request that sends your log in info and then saving the session cookie that the server gives you (which is your proof that you are who you say you are).
You can use the
System.Windows.Forms.WebBrowserfor a less control but out-of-the-box solution that will handle cookies.My preferred method is to use
System.Net.HttpWebRequestto send and receive all web data and then use the HtmlAgilityPack to parse the returned data into a Document Object Model (DOM) which can be easily read from.The trick to getting
System.Net.HttpWebRequestto work is that you must create a long-livedSystem.Net.CookieContainerthat will keep track of your log in info (and other things the server expects you to keep track of). The good news is that theHttpWebRequestwill take care of all of this for you if you provide the container.You need a new
HttpWebRequestfor each call you make, so you must sets their.CookieContainerto the same object every time. Here is an example:UNTESTED
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx
HttpWebRequest Class
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx