Given a web URL, I want to detect all the links in a WEBSITE, identify the internal links and list them.
What I have is this:
WebClient webClient = null;
webClient = new WebClient();
string strUrl = "http://www.anysite.com";
string completeHTMLCode = "";
try
{
completeHTMLCode = webClient.DownloadString(strUrl);
}
catch (Exception)
{
}
Using this I can read the contents of the page….but the only idea I have in my mind is parsing this string….searching for <a then href then the value between the double quotes.
Is this the only way out? Or there lies some other better solution(s)?
Use the HTML Agility Pack. Here’s a link to a blog post to get you started. Do not use Regex.