I’m having a problem with Regular Expressions in C#.
What I have is a string representing a page (HTML etc.). The string also contains \r\n, \r and \n in different places, now I’m trying to match something in the string:
Match currentMatch = Regex.Match(contents, "Title: <strong>(.*?)</strong>");
string org = currentMatch.Groups[1].ToString();
This works fine, however, when I want to match something that has any of the characters mentioned earlier (line breaks) in the string, it doesn’t return anything (empty, no match):
Match currentMatch = Regex.Match(contents, "Description: <p>(.*?)</p>");
string org = currentMatch.Groups[1].ToString();
It does however work if I add the following lines above the match:
contents = contents.Replace("\r", " ");
contents = contents.Replace("\n", " ");
I however don’t like that its modify the source, what can I do about this?
The
.does not match newline characters by default. You can change this, by using the Regex OptionSingleline. This treats the whole input string as one line, i.e. the dot matches also newline characters.By the way, I hope you are aware that regex is normally not the way to deal with Html?