I am using a simple regular expression (in C#) to find a whole word within a block of text.
The word may appear at the beginning, end or in the middle of a the text or sentence with in the text.
The expression I have been using \bword\b has been working fine however if the word included a special character (that has been escaped) it no longer works. The boundary is essential so that we do not pick up words such as vb.net as a match for .net.
Two examples that fail are:
\bc\#\b
\b\.net\b
I can change the word boundary to a list of other checks such as not at the start non-space etc. however this is complex and can be slow if used on a large number of words.
The
\bmatches the boundary between word characters and non-word characters, but won’t match the boundary between two non-word characters.For example, in the case of
C#there’s a boundary between theC(a word character) and the#(a non-word character) but not between the#and whatever comes after it (space, punctuation, end-of-string etc).You can workaround this problem as follows:
(?:^|\W)instead of\bat the beginning of the expression.For example,
(?:^|\W)\.NET\bThis will match either the start-of-string or a non-word character before the
.character.(?:\W|$)instead of\bat the end of the expression.For example,
\bC#(?:\W|$)This will match either a non-word character or the end-of-string after the
#character.