Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 52537
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 10, 20262026-05-10T16:55:21+00:00 2026-05-10T16:55:21+00:00

I need to take a paragraph of text and extract from it a list

  • 0

I need to take a paragraph of text and extract from it a list of ‘tags’. Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities

I’ve used an implementation of Porter Stemmer algorithm (I’m writing in PHP by the way):

http://tartarus.org/~martin/PorterStemmer/php.txt

This works, up to a point, but doesn’t return ‘real’ words. The example above is stemmed to ‘commun’.

I’ve tried ‘Snowball’ (suggested within another Stack Overflow thread).

http://snowball.tartarus.org/demo.php

For my example (community / communities), Snowball stems to ‘communiti’.

Question

Are there any other stemming algorithms that will do this? Has anyone else solved this problem?

My current thinking is that I could use a stemming algorithm to avoid duplicates and then pick the shortest word I encounter to be the actual word to display.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-10T16:55:21+00:00Added an answer on May 10, 2026 at 4:55 pm

    The core issue here is that stemming algorithms operate on a phonetic basis purely based on the language’s spelling rules with no actual understanding of the language they’re working with. To produce real words, you’ll probably have to merge the stemmer’s output with some form of lookup function to convert the stems back to real words. I can basically see two potential ways to do this:

    1. Locate or create a large dictionary which maps each possible stem back to an actual word. (e.g., communiti -> community)
    2. Create a function which compares each stem to a list of the words that were reduced to that stem and attempts to determine which is most similar. (e.g., comparing ‘communiti’ against ‘community’ and ‘communities’ in such a way that ‘community’ will be recognized as the more similar option)

    Personally, I think the way I would do it would be a dynamic form of #1, building up a custom dictionary database by recording every word examined along with what it stemmed to and then assuming that the most common word is the one that should be used. (e.g., If my body of source text uses ‘communities’ more often than ‘community’, then map communiti -> communities.) A dictionary-based approach will be more accurate in general and building it based on the stemmer input will provide results customized to your texts, with the primary drawback being the space required, which is generally not an issue these days.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

i need to take some row. They came from sql TARIH (sql column) is
Hopefully this is pretty straight forward, just can't work it out. I need to
I need to take a string that was originally read in from a text
I need to take a string of text var myText = $(this).text(); which looks
I need to take a web page and extract the address information from the
Need to take a SELECT drop down list options and find if any of
I need to take rows from two separate tables, and arrange them in descending
From my previous question for selecting specific html text, I have gone through this
We need to take the actual date from the IPad and save it in
I need to take the output of $one and make an array, however, the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.