Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7407517
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T05:44:07+00:00 2026-05-29T05:44:07+00:00

Wanted some ideas about building a tool which can scan text sentences (written in

  • 0

Wanted some ideas about building a tool which can scan text sentences (written in english language) and build a keyword rank, based on the most occurrences of words or phrases within the texts.

This would be very similar to the twitter trends wherin twitter detects and reports the top 10 words within the tweets.

I have identified the high level steps in the algorithm as follows

  1. Scan the text and remove all the common , frequent words ( such as, “the” , “is” , “are”, “what” , “at” etc..)
  2. Add the remaining words to a hashmap. If the word is already in the map then increment its count.
  3. To get the top 10 words , iterate through the hashmap and find out the top 10 counts.

Step 2 and 3 are straightforward but I do not know in step 1 how do I detect the important words within a text and segregate them from the common words (prepositions, conjunctions etc )

Also if I want to track phrases what could be the approach ?
For example if I have a text saying “This honey is very good”
I might want to track “honey” and “good” but I may also want to track the phrases “very good” or “honey is very good”

Any suggestions would be greatly appreciated.

Thanks in advance

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T05:44:08+00:00Added an answer on May 29, 2026 at 5:44 am

    Actually, your step 1 would be quite similar to step 3 in the sense that you may want to constitute an absolute database of the most common words in the English language in the first place. Such a list is available easily on the internet (Wikipedia even has an article referencing the 100 most common words in the English language.) You can store those words in a hashmap and while scanning your text contents just ignore the common tokens.

    If you don’t trust Wikipedia and the already existing listing for common words, you can build your own database. For that purpose, just scan thousands of tweets (the more the better) and make your own frequency chart.

    You’re facing an n-gram-like problem.

    Do not reinvent the wheel. What you seem to be wanting to do has been done thousands of times, just use existing libs or pieces of code (check the External Links section of the n-gram Wikipedia page.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to learn ASP.NET and just wanted some input as to which
I'm about to port a smallish library from Java to Python and wanted some
Just wanted to write some recursion but can't check if the child is in
I wanted to throw this out there for some ideas. I'm writing a program
I went through few blogs and sites which gave me some information about how
I have been researching and have some few ideas about a distributed caching system
as a beginner, I have formulated some ideas, but wanted to ask the community
I wanted some of those spiffy rounded corners for a web project that I'm
I just wanted some opinions from people that have run Selenium ( http://selenium.openqa.org )
I know the topic I started is too subjective. But I just wanted some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.