I am processing user input on a search page. If the user selects an ‘All Words’ type search, then I remove any boolean search operators from the search text and stick ' AND ' between each real word. Pretty simple in most cases. However, I can’t figure out how to remove two boolean operators in a row.
Here is my code:
// create the regex
private static Regex _cleaner =
new Regex("(\\s+(and|or|not|near)\\s+)|\"",
RegexOptions.Compiled | RegexOptions.IgnoreCase);
// call the regex
_cleaner.Replace(searchText, " ")
The problem occurs when a user enters a search string like coffee and not tea. The regex will remove the ‘and’, but not the ‘not’. The resulting string is ‘coffeenot tea’ – what I want is ‘coffee tea’.
The white space is required in the regex so I don’t remove ‘and’, ‘or’, etc when embedded in real words (like ‘band’ or ‘corps’).
I have temporarily resolved this by calling the clean method twice, which will remove two operators in a row (which is probably all I would ever need). But it is not very elegant, is it? I would really like to do it right. I feel like I am missing something simple…
Try adding word boundaries: