Basically, I am trying to find the following pattern in a multiline textbox:
[p]anyword bla bla anyword[/p]
1.) The pattern can occur n-times in the textbox and I also want it n-times to be found.
2.) Between [p] and [/p] can be any character including whitespaces and linebreaks (“\r\n” in C#)
3.) I want the whole pattern, inluding the [p] and [/p]
The following code is very near to my wanted result. The problem is, that multiple linebreaks can occur between [p] and [/p]. I have tried out many many solutions. Nothing worked for me.
private void getTextFromTag2(String Tag, String txt)
{
txt = txt.Replace("\r", "");
string re1 = "(\\[";
string re2 = "p";
string re3 = "\\]";
string re4 = ".*"; // Here lies the problem
string re5 = ""; // Left open for a solution => \r\n cann occur n-times
string re6 = "\\[";
string re7 = "\\/";
string re8 = "p";
string re9 = "\\])";
Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8 + re9, RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection mc = r.Matches(txt, 0);
foreach (Match match in mc)
{
String c1 = match.Groups[1].ToString();
Console.Write(c1 + "\r\n");
}
}
As you might see, I already replaced “\r” with “” in txt, because the RegEx engine of .NET seems to want only “\n” as a new line character.
I think, the problem in my code is to be found in re4 and re5. re4 finds any character and works good, as long as there are no line breaks.
I think, re4 should say “any character, including whitespaces and \n”. But I really don’t get it.
So once again: Everting works fine, even if the pattern occurs many times in the textbox. The problem is, when linebreaks occur between [p] and [/p]
Here is an examle that does NOT work
[p]BlaBla BlaBla \r\n
BlaBla BlaBla \r\n
\r\n
BlaBla
[/p]
Here is an examle that DOES work
[p]BlaBla BlaBla[/p]
\r\n
\r\n
[p]Even more BlaBla[/p]
\r\n
\r\n
[p]Much more BlaBla[/p]
Please excuse my english. I am not a native english speaker.
Thank you.
This is the code, that now works for me. The changed things are //Changed Tagged
private void getTextFromTag2(String Tag, String txt)
{
//txt = txt.Replace("\r", ""); //Changed
string re1 = "(\\[";
string re2 = "p";
string re3 = "\\]";
string re4 = ".*";
string re5 = "?"; // Changed
string re6 = "\\[";
string re7 = "\\/";
string re8 = "p";
string re9 = "\\])";
Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8 + re9, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline); //Changed
MatchCollection mc = r.Matches(txt, 0);
foreach (Match match in mc)
{
String c1 = match.Groups[1].ToString();
Console.Write(c1 + "\r\n");
}
}
Thank you so much.
You need to specify the Singleline option
Basically the “Dot-matches-all” option you may be familiar with from other languages. The Multiline option you set only affects the behavior of the matching the beginning and ending of a line. See RegexOption class for more details.
The other issue with the regex you provided is the
*is greedy. so [p][/p][p][/p] would be a single match (it matched on the first [p] and the last [/p]. changing your re5 to:Will fix that so you get two seperate matches.