Possible Duplicate:
Regular expression to match US phone numbers
I need to find phone numbers in html, i have seen many examples here and on google but not sure why i cannot get any one to work , it simply wont find the number .Suppose html is :
Basically i was going for all US pattern phone numbers, but any thing i found i used it but no luck i am using this code:
CODE:
public static string Extractphone(string html)
{
StringBuilder sb = new StringBuilder();
try
{
List<string> tmpemail = new List<string>();
string data = html;
//instantiate with this pattern
Regex emailRegex = new Regex(@"(\\d{3})-(\\d{3})-(\\d{4})",
RegexOptions.IgnoreCase);
//find items that matches with our pattern
MatchCollection emailMatches = emailRegex.Matches(data);
foreach (Match emailMatch in emailMatches)
{
if (!tmpemail.Contains(emailMatch.Value.ToLower()))
{
sb.AppendLine(emailMatch.Value.ToLower());
tmpemail.Add(emailMatch.Value.ToLower());
}
// (541) 708-1364
}
//store to file
}
catch (Exception ex)
{
}
return sb.ToString();
}
I have changed the pattern many times from many examples but no luck.
'\\'is not escaping the backslash. Just removing the extra slash will get you to match your first case\(?. The same with the trailing one you may have that and 0+ spaces or the dash so you need to check the or case so instead of just – you need(\)\s*|-)\d{3}or\d{4}groups as it is a single match. That is probably just making the expression harder to read and understandSo that leaves you with the following for your Regex initialization
I haven’t tested this robustly but I think that works.
As a side note regular expressions are one of those things that are really cryptic if you don’t understand them. Trying to just take someone else’s expression and use it can give poor results if you don’t actually understand what is being checked for in the expression. Also what I wrote there is not comprehensive. It would only be useful in those two cases. To be able to handle any phone number the expression quickly gets much more complicated.