Anyone know of a simple way to use Python and NLTK to get the

Question

0

Asked: June 9, 20262026-06-09T10:12:35+00:00 2026-06-09T10:12:35+00:00

Anyone know of a simple way to use Python and NLTK to get the

0

Anyone know of a simple way to use Python and NLTK to get the article that follows closest to a search query? For example, I would like to take 10 articles from Wikipedia, find the frequency distributions for each of them (along with another method of classification, if you have any recommendations), and based on a search query, return the most likely articles that you may be referring to.

Any ideas? I would like a better method other than frequency distribution but I thought I would start there.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T10:12:37+00:00

Rocchio’s algorithm aka TFxIDF aka aka tf-idf aka tfidf aka even tf/idf (sic) is pretty much the standard solution. Instead of the bare frequency, you calculate the term frequency for the whole document set, then express the term’s weight as the document’s term frequency divided by the total frequency count. That way, you don’t need stop words, because the IDF of a common word will make its weight nearly zero.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Anyone know of a simple way to use Python and NLTK to get the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply