I’m working on a project that needs to count the occurrence of every word

Question

0

Asked: May 27, 20262026-05-27T16:14:38+00:00 2026-05-27T16:14:38+00:00

I’m working on a project that needs to count the occurrence of every word

0

I’m working on a project that needs to count the occurrence of every word of a txt file.
For example, I have a text file like this:

What Silver Lake Looks For in IPO Candidates
3 Companies Crushed by Earnings: Apple, Cirrus Logic, IBM
IBM’s Palmisano: How You Get To Be A 100-Year Old Company

If there are 3 sentences shown above in the file and I want to calculate every word’s occurrence. Here, Companies and company should be considered as the same word “company”(lowercase), so the total occurrence for the word “company” is 2.

Is there any NLP toolkit for java that can tell two words like “families” and “family” are actually from the same word “family”?

I’ll count the occurrence of every word to further do the Naive Bayes training, so it’s very important to get the accurate numbers of occurrences of each word.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T16:14:38+00:00

Editorial Team

2026-05-27T16:14:38+00:00Added an answer on May 27, 2026 at 4:14 pm

Apache Lucene and OpenNLP provide good stemming algorithm implementations. You can review and use the best one that suites you. I’ve been using Lucene for my projects.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a project that needs to count the occurrence of every word

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply