In NLTK, using a naive bayes classifier, I know from examples its very simply

Question

0

Asked: June 7, 20262026-06-07T19:40:47+00:00 2026-06-07T19:40:47+00:00

In NLTK, using a naive bayes classifier, I know from examples its very simply

0

In NLTK, using a naive bayes classifier, I know from examples its very simply to use a “bag of words” approach and look for unigrams or bigrams or both. Could you do the same using two completely different sets of features?

For instance, could I use unigrams and length of the training set (I know this has been mentioned once on here)? But of more interest to me would be something like bigrams and “bigrams” or combinations of the POS that appear in the document?

Is this beyond the power of the basic NLTK classifier?

Thanks
Alex

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T19:40:50+00:00

NLTK classifiers can work with any key-value dictionary. I use {"word": True} for text classification, but you could also use {"contains(word)": 1} to achieve the same effect. You can also combine many features together, so you could have {"word": True, "something something": 1, "something else": "a"}. What matters most is that your features are consistent, so you always have the same kind of keys and a fixed set of possible values. Numeric values can be used, but the classifier isn’t smart about them – it will treat numbers as discrete values, so that 99 and 100 are just as different as 1 and 100. If you want numbers to be handled in a smarter way, then I recommend using scikit-learn classifiers.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In NLTK, using a naive bayes classifier, I know from examples its very simply

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply