I have been trying to write a code that will check if the given string contains certain strings with certain pattern.
To be precise, for example:
string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman"
List<string> checkList = new List<string>{"homo sapiens","human","man","woman"};
Now, I want to extract
"homo sapiens", "human" and "woman" but NOT "man"
from the above list as they follow the pattern, i.e string followed by~ or one of the strings inside parenthesis that starts with ~.
So far I have come up with:
string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman"
List<string> checkList = new List<string>{"homo sapiens","human","man","woman"};
var prunedList = new List<string>();
foreach(var term in checkList)
{
var pattern = @"~(\s)*(\(\s*)?(\(?\w\s*\)?)*" + term + @"(\s*\))?";
Match m = Regex.Match(mainString, pattern);
if(m.success)
{
prunedList.Add(term);
}
}
But this pattern is not working for all cases…
Can any one suggest me how this can be done?
Paranthesis checking is a context-free language or grammar which requires a stack for checking. Regular expressions are suitable for regular languages. They do not have memory, therefore they cannot be used for such purposes.
To check this you need to scan the string and count the parentheses:
countto 0(then incrementcount)then decrementcountcountis negative, raise an error that parentheses are inconsistent; e.g.,)(countis positive, then there are some unclosed parenthesiscountis zero, then the test is passedOr in C#:
You see, since regular expressions are not capable of counting, then they cannot check such patterns.