I want to remove all noise tags from input tags (a string)
The tags are separated by comma. If a noise word is part of a big tag, it will remain.
This is what I have but not working:
string input_string = "This,sure,about,all of our, all, values";
string stopWords = "this|is|about|after|all|also";
stopWords = string.Format(@"\s?\b(?:{0})\b\s?", stopWords);
string tags = Regex.Replace(input_string, stopWords, "", RegexOptions.IgnoreCase);
This is what I want from above input:
“,sure,,all of our,,values”
These words “This”, “about”, “all” will be replaced with “” since they are noise words.
But “all of our” will remain even if it has the noise word “all” in it.
This is because comma is the tag boundary
Anyone can give me a helping hand?
I had an alternate solution that puts the noise words into a dictionary and then search each word in input string. But I prefer RegEx approach.
1 Answer