I am writing a web crawler in c#. Within the method to get all of the links on a page, i want to return the list of links, but ‘filter’ it with LINQ so that the list only contains urls that exist. I have a helper method written called RemoteFileExists that returns a boolean value. At the end of the method, I wrote the following LINQ line:
//Links is a List<string> that hasn't been filtered
return (from link in Links
where RemoteFileExists(link)
select link).ToList<string>();
For some reason, when I do this, the List is returned empty.
RemoteFileExists:
static bool RemoteFileExists(string url)
{
try
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "HEAD";
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
return (response.StatusCode == HttpStatusCode.OK);
}
catch
{
return false;
}
I guess either you links are not correct or your sites don’t support
HEAD. Since this code works