Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 393083
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T16:13:22+00:00 2026-05-12T16:13:22+00:00

For an enterprise application research project me and another person are working on, we

  • 0

For an enterprise application research project me and another person are working on, we are looking to remove certain content from the page to keep the posted messages universal(meaning not offensive and essentially anonymous). Right now we want to take a message that a user has posted to a message board, and remove any type of name, name of a college or institution,and profanity(and if later possible we would like to remove business names).

Is there some database that we can connect to that we can run scrub our messages with to check against values in the database in order to recognize these?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T16:13:22+00:00Added an answer on May 12, 2026 at 4:13 pm

    The question seems to imply an online database which would be queried during the processing of messages. Operational issues (reliability of such services, lag in response time etc.) as well as completeness issue (need to query multiple databases because no single one will cover 100% of the project’s lexical needs) render this online/real-time approach impractical. There are however many databases available for download and which would allow you to build your own local database of "hot words".

    A good place to start could be WordNet, were you’d likely use all of the "instance" words as words that should typically need to be removed from messages, as you anonymize/cleanse them. (Maybe you’ll also want to keep the "non instance" words in a separate table/list of words "more likely to be ok"). This list alone could likely support honorably well a "0.9" version of your application.

    You’ll eventually want to extend this lexical database of "bad words" however, for example to include all universities acronyms (CMU, UCSD, DU, MIT, UNC and such), Sports Teams names (Celtics, Bruins, Bruins, Red Sox…) and depending on the domain of your messages, additional names of public figures (Wordnet has several, such George Bush or Robert De Niro, but it lacks less famous people or people that came of fame more recently: eg Barack Obama)

    To complement Wordnet, two distinct types of sources come to mind:

    • traditional online databases
    • ontologies and folksonomies

    Examples of the former are say "Cities/State by ZIP code" at the USPS. Examples of the latter are various "lists" compiled by scholars, organizations or various individuals. It is impossible to provide an exhaustive list of either of these source types, but the following should help:

    • DAML.ORG Catalog of ontologies
    • US Regions and States example of an ontology DAML format
    • Open Directory project Open Source directory (attention, gets quickly messy)
    • SourceWatch.org example of a "list of lists : folks in journalism/politics"
    • Seach Engine keywords: "List Of Lists", or also use three or four of the words you’d expect to find in the list you seek.

    In simpler cases, one can merely download lists and such, or also, "cut-and-paste". The ontologies will be "encumbered" with additional attributes that you’ll need to parse out (in the future you may actually desire these attributes and use the ontologies in a more traditional fashion, for now, grabbing the lexical entities is all that is needed).

    This lexical database compilation task may seem daunting. But the 80-20 rule, states that 20% of the "hot words" will account for 80% of the citations in the messages, and therefore with a relatively small effort, you should be able to produce a system that covers 90%+ of your use cases.

    Looking ahead: Beyond the "hot words" database
    There are many ways of approaching this task, using various techniques and concepts from Natural Language Processing (NLP). As your project gains in sophistication, you may want to learn about some of these concepts and possibly implement them. For example a simple POS tagger comes to mind, as it may help [in part] discriminating between say various usage of the token "SCREW" as your application discards offensive words. ("The board of directors wants to screw the students" vs. "The board should be fastened with a minimum of 4 screws per yard".

    Before even needing these formal NLP techniques, you may use a few pattern-based rules to handle common cases associated with the domain(s) relative to the type of messages the project targets. For example, you may consider the following:

    • (word) State University
    • Senator (Word_Starting_with_Capital letter)
    • Words that mix letters and numbers (these are often used to misspell names and circumvent the type of filters your projects wishes to implement)

    Another tool that may be useful, in particular in the beginning will be a system that collects statistical info about the message corpus: word frequency, most common words, most common bigrams (two consecutive words) etc.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working on an enterprise application that leverages the repository pattern on top of
I've created a very simple Enterprise Application project with about 7 entity beans and
I'm trying to get a new Enterprise Application Project set up in Eclipse using
Consider the following enterprise application layering example: project-services -> POJO Services layer project-web ->
I'working on a enterprise application that uses JSF 2.0, with Netbeans 7.0 and Glassfish
I'm working on an enterprise application that relies heavily on message queues, com+, databases,
I created an Enterprise application project with eclipse with 3 separated project: Test, TestEJB
I am working on an enterprise application and need to provide services to mobile
I am looking for suggestion for hosting my WCF enterprise application. The app. require
I have an EJB (PersonManager) in the Enterprise Application modul, which injects another EJB

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.