I am trying to extract content from a blog article like this:
static void GetBlogData (string blogPostUrl)
{
string blogPostContent = null;
WebClient client = new WebClient ();
//client.Headers.Add (HttpRequestHeader.Referer, "http://www.stackoverflow.com");
TextWriter writer = new StreamWriter ("/home/nanda/projects/mono/common/article");
try
{
blogPostContent = client.DownloadString (blogPostUrl);
}
catch (Exception ex)
{
Term.PrintLn ("Unable to download\n{0}", ex.Message);
}
if (blogPostContent != null)
{
writer.WriteLine (blogPostContent);
}
else
{
Term.PrintLn ("No content found");
}
}
I am aware that this is too simple of an approach, but I want to know why I am unable to extract content from some URLs like they have a block or something. How can I detect if a website/blog is blocking me from downloading its content?
A website cannot block you from downloading its content without blocking the site’s consultation from a browser.
If your download fails, it means either:
a) your url is wrong
b) the website needs some form of identification and your request lacks something (probably a cookie)