What I want is, opening a Link from a Website (from HtmlContent)
and get the Html of this new opened site..
Example: I have http://www.google.com, now I want to find all Links.
For each Link I want to have the HTMLContent of the new Site.
I do something like this:
foreach (String link in GetLinksFromWebsite(htmlContent))
{
using (var client = new WebClient())
{
htmlContent = client.DownloadString("http://" + link);
}
foreach (Match treffer in istBildURL)
{
string bildUrl = treffer.Groups[1].Value;
bildLinks.Add(bildUrl);
}
}
public static List<String> GetLinksFromWebsite(string htmlSource)
{
string linkPattern = "<a href=\"(.*?)\">(.*?)</a>";
MatchCollection linkMatches = Regex.Matches(htmlSource, linkPattern, RegexOptions.Singleline);
List<string> linkContents = new List<string>();
foreach (Match match in linkMatches)
{
linkContents.Add(match.Value);
}
return linkContents;
}
The other problem is, that I only get Links, not Linkbuttons (ASP.NET)..
How can I solve the problem?
Steps to follow:
regexorregular expressionout from your project and which deals with parsing HTML (read this answer to better understand why). In your case this would be the contents of theGetLinksFromWebsitemethod.Here’s an example: