I’m looping through an array of C# keywords, and applying a regex for each one. (I’m working on syntax highlighting.)
I only want to match if:
-
The keyword is the first word on the line OR the character before is
a space, period or an open parenthesisAND
-
The character after the keyword is a space, period or open
parenthesis.
Here’s what I came up with:
foreach (string keyword in keyWords)
{
regex = new Regex(@"(?<=[\s\.(])" + keyword + @"(?=[\s\.(])");
foreach (Match match in regex.Matches(code))
{
code = code.Replace(match.Value, "<span class='keyword'>" + match.Value + "</span>");
}
}
So, in the case of the following text:
“foreach(string s in ss){}”
The word “foreach” matches, BUT also the keyword “in” nested int he word “string” matches – but that’s not good because the characters before and after don’t match the criteria.
Interestingly enough, in the case of the following text:
“xforeachx(string s in ss){}”
The word “foreach” doesn’t match.
So why does the “in” in the word “string” match but not the “foreach” in the second example? What am I missing?
Thank you!
Here is a very simple demo of what I referencing in a comment:
Resulting in:
Which is I think what you’re after. I used static regex methods but you can refactor it how you’d like. Some things I’d like to point out:
Regex.Escapewhen you’re inserting values in the middle of a regex statement that you’re not constructing yourself. Even if the keywords turn out to be only letters, some changes at a later date may break it. Better safe than sorry.foreshadow–who knows.^|which means match the beginning of a line or what is found in the class.