I made an application designed to prepare files for translation using lists of regexes.
It runs each regex on the file using Regex.Replace. There is also an inspector module which allows the user to see the matches for each regex on the list.
It works well, except when a regex contains a back-reference, Regex.Replace does not replace anything, yet the inspector shows the matches properly (so I know the regex is valid and matches what it should).
sSrcRtf = Regex.Replace(sSrcRtf, sTag, sTaggedTag,
RegexOptions.Compiled | RegexOptions.Singleline);
sSrcRtf contains the RTF code of the page. sTag contains the regular expression in between parentheses. sTaggedTag contains $1 surrounded by the tag formating code.
To give an example:
sSrcRtf = Regex.Replace("the little dog", "((e).*?\1)", "$1",
RegexOptions.Compiled | RegexOptions.Singleline);
doesn’t work. But
sSrcRtf = Regex.Replace("the little dog", "((e).*?e)", "$1",
RegexOptions.Compiled | RegexOptions.Singleline);
does. (of course, there is some RTF code around $1)
Any idea why this is?
You technically have two match groups there, the outer and the inner parentheses. Why don’t you try addressing the inner set as the second capture, e.g.:
Your parser probably thinks the outer capture is \1, and it doesn’t make much sense to backreference it from inside itself.
Also note that your replacement won’t do anything, since you are asking to replace the portion that you match with itself. I’m not sure what your intended behavior is, but if you are trying to extract just the match and discard the rest of the string, you want something like: