I am trying to create a bad word filter method that I can call before every insert and update to check the string for any bad words and replace with “[Censored]”.
I have an SQL table with has a list of bad words, I want to bring them back and add them to a List or string array and check through the string of text that has been passed in and if any bad words are found replace them and return a filtered string back.
I am using C# for this.
Please see this “clbuttic” (or for your case cl[Censored]ic) article before doing a string replace without considering word boundaries:
http://www.codinghorror.com/blog/2008/10/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea.html
Update
Obviously not foolproof (see article above – this approach is so easy to get around or produce false positives…) or optimized (the regular expressions should be cached and compiled), but the following will filter out whole words (no “clbuttics”) and simple plurals of words:
Gives the output:
Note that “classical” does not become “cl[Censored]ical”, as whole words are matched with the regular expression.
Update 2
And to demonstrate a flavour of how this (and in general basic string\pattern matching techniques) can be easily subverted, see the following string:
I have replaced the “i”‘s with Turkish lower case undottted “ı”‘s. Still looks pretty offensive!