Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4569442
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T19:13:12+00:00 2026-05-21T19:13:12+00:00

Last year I was working on a Christmas project which allowed customers to send

  • 0

Last year I was working on a Christmas project which allowed customers to send emails to each other with a 256 character free-text field for their Christmas request. The project worked by searching the (very-large) product database for suggest products that matched the text field, but offered a free text option for those customers that could not find the product in question.

One obvious concern was the opportunity for customers to send rather explicit requests to some unsuspecting customer with the company’s branding sitting around it.

The project did not go ahead in the end, for various reasons, the profanity aspect being one.

However, I’ve come back to thinking about the project and wondering what kinds of validation could be used here. I’m aware of clbuttic which I know is the standard response to any question of this nature.

The solutions that I considered were:

  • Run it through something like WebPurify
  • Use MechanicalTurk
  • Write a regex pattern which looks for the word in the list. A more complicated version of this would consider plurals and past tenses of the word as well.
  • Write an array of suspicious words, and score each one. If the submission goes above a score, the validation fails.

So there are two questions:

  1. If the submission fails, how do you handle it from a UI perspective?
  2. What are the pros and cons of these solutions, or any others that you can suggest?

NB – answers like “profanity filters are evil” are irrelevant. In this semi-hypothetical situation, I haven’t decided to implement a profanity filter or been given the choice of whether or not to implement one. I just have to do the best I can with my programming skills (which should be on a LAMP stack if possible).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T19:13:12+00:00Added an answer on May 21, 2026 at 7:13 pm

    Have you thought about bayesian filtering? Bayesian filtering is not just for detecting spam. You can train them in a variety of text recognition tasks. Grab a bayesian filter, collect a bunch of request texts and start marking them as containing profanity or not. After some time (how much time depends a lot on the amount and type of training data) your filter will be able to detect requests containing profanity from those containing no profanity.

    It’s not fool-proof, but it’s much, much better than simple string matching and trying to deal with clbuttic problems. You have a variety of possibilities for bayesian filtering in PHP.

    bogofilter

    Bogofilter is a stand-alone bayesian filter that runs on any unix-y OS. It’s targeted at filtering e-mail but you can train it for any kind of text. I have succesfully used this to implement a custom comment spam filter for my own website (source). You can interface with bogofilter like you can with any other commandline application. See my source code link for an example.

    Roll your own

    If you like a challenge, you could implement a bayesian filter entirely from scratch. Here’s a decent article about implementing a bayesian filter in PHP.

    Existing PHP libraries

    • http://xhtml.net/php/PHPNaiveBayesianFilter
    • http://nasauber.de/opensource/b8/index.php.en

    (Ab)use an existing e-mail filter

    You could use a standard SpamAssassin or DSpam installation and train it to recognise profanity. Just make sure that you disable options specifically aimed at e-mail messages (e.g. parsing mime blocks, reading headers) and just enable the options that deal with the baysian text processing. DSpam may be easier to adapt. SpamAssassin has the advantage that you can add custom rules on top of the bayesian filter. For SpamAssassin, make sure you disable all the default rules and write your own rules instead. The default rules are all targeted at spam e-mail detection.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I spent the last year working on a project that involved mostly java for
My company purchased Visual Studio Pro 2008 last year, which had a 'free' copy
Within the last year, I've been working with other people on some Objective-C projects
I have a fantasy football league rails app that was working last year and
Last year I developed a data access service for our project using Entity Framework
In last 1 year I was working on Java and flex. While coding flex,
I was working with VB6 last year and I used an Add-on that made
In the last year and a bit of working on my team's code base
I'm currently working on a project in which i need to read some (Latitude,
In the last year I've started programming in Fortran working at a research university.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.