I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines.
Example:
Source string:
"consecuti?vely"
Replace rules:
.Replace("cuti?",@"cuti?(-\s+)?")
.Replace("con",@"con(-\s+)?")
.Replace("consecu",@"consecu(-\s+)?")
Desired output:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems.
Whats the best solution to perform such a multiple replace, which will produce the desired output?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\s+)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context.
EDIT: My current code, doesnt work when replace rules overlap like in example above
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, @"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + @"(-\s+)?");
}
return regex;
}
I had to give up an easy solution and did the editing of the regex myself. As a side effect, the new approach goes only twice trough the string.