I started learning REGEX 2 days ago , now id like to make a small application that read the source code of a webpage and get webpages like http://page.com or http://www.page.com/sub/sub/sub?=value , etc….. , stuff like that , anyway that`s the code i typed :
Regex r = new Regex("http://\\w");
HttpWebRequest httpwebrequest = (HttpWebRequest)WebRequest.Create("http://maktoob.yahoo.com/?p=us");
HttpWebResponse response = (HttpWebResponse)httpwebrequest.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
string line;
while ((line = sr.ReadLine()) != null)
{
Match m = r.Match(line);
if (m.Success)
{
Console.WriteLine("Match: " +m.Value);
}
}
sr.Close();
response.Close();
But the result is :
Match: http://l
Match: http://w
Match: http://x
Match: http://l
Match: http://q
It just get the first character after //
When i looked at my pattern i said lol yeah my pattern is http://\w , so it will get the first character , but i wanted to know what should i add to my pattern for it to get the rest of the link ????
If you only need to match hyperlinks within
<a>elements, then you could take advantage of the enclosing quotes or double quotes to delimit your URL.That would match any text within an
href='…'orhref="…"attribute that starts withhttp://orhttps://.