How do I data mine a pile of text to get keywords by usage?

Question

0

Asked: May 12, 20262026-05-12T16:31:32+00:00 2026-05-12T16:31:32+00:00

How do I data mine a pile of text to get keywords by usage?

0

How do I data mine a pile of text to get keywords by usage? (“Jacob Smith” or “fence”)

And is there a software to do this already? even semi-automatically, and if it can filter out simple words like “the”, “and”, “or”, then I could get to the topics quicker.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T16:31:33+00:00

The general algorithm is going to go like this:

- Obtain Text
- Strip punctuation, special characters, etc.
- Strip "simple" words
- Split on Spaces
- Loop Over Split Text
    - Add word to Array/HashTable/Etc if it doesn't exist;
       if it does, increment counter for that word

The end result is a frequency count of all words in the text. You can then take these values and divide by the total number of words to get a percentage of frequency. Any further processing is up to you.

You’re also going to want to look into Stemming. Stemming is used to reduce words to their root. For example going => go, cars => car, etc.

An algorithm like this is going to be common in spam filters, keyword indexing and the like.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How do I data mine a pile of text to get keywords by usage?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply