Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7767019
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T15:34:15+00:00 2026-06-01T15:34:15+00:00

I basically have the same question as this guy .. The example in the

  • 0

I basically have the same question as this guy.. The example in the NLTK book for the Naive Bayes classifier considers only whether a word occurs in a document as a feature.. it doesn’t consider the frequency of the words as the feature to look at (“bag-of-words”).

One of the answers seems to suggest this can’t be done with the built in NLTK classifiers. Is that the case? How can I do frequency/bag-of-words NB classification with NLTK?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T15:34:17+00:00Added an answer on June 1, 2026 at 3:34 pm

    scikit-learn has an implementation of multinomial naive Bayes, which is the right variant of naive Bayes in this situation. A support vector machine (SVM) would probably work better, though.

    As Ken pointed out in the comments, NLTK has a nice wrapper for scikit-learn classifiers. Modified from the docs, here’s a somewhat complicated one that does TF-IDF weighting, chooses the 1000 best features based on a chi2 statistic, and then passes that into a multinomial naive Bayes classifier. (I bet this is somewhat clumsy, as I’m not super familiar with either NLTK or scikit-learn.)

    import numpy as np
    from nltk.probability import FreqDist
    from nltk.classify import SklearnClassifier
    from sklearn.feature_extraction.text import TfidfTransformer
    from sklearn.feature_selection import SelectKBest, chi2
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.pipeline import Pipeline
    
    pipeline = Pipeline([('tfidf', TfidfTransformer()),
                         ('chi2', SelectKBest(chi2, k=1000)),
                         ('nb', MultinomialNB())])
    classif = SklearnClassifier(pipeline)
    
    from nltk.corpus import movie_reviews
    pos = [FreqDist(movie_reviews.words(i)) for i in movie_reviews.fileids('pos')]
    neg = [FreqDist(movie_reviews.words(i)) for i in movie_reviews.fileids('neg')]
    add_label = lambda lst, lab: [(x, lab) for x in lst]
    classif.train(add_label(pos[:100], 'pos') + add_label(neg[:100], 'neg'))
    
    l_pos = np.array(classif.classify_many(pos[100:]))
    l_neg = np.array(classif.classify_many(neg[100:]))
    print "Confusion matrix:\n%d\t%d\n%d\t%d" % (
              (l_pos == 'pos').sum(), (l_pos == 'neg').sum(),
              (l_neg == 'pos').sum(), (l_neg == 'neg').sum())
    

    This printed for me:

    Confusion matrix:
    524     376
    202     698
    

    Not perfect, but decent, considering it’s not a super easy problem and it’s only trained on 100/100.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Basically, I have the same question as this one , which unfortunately never got
I have basically the same problem outlined in this question, however I am using
This is a follow-up on this question . I basically have the same question
I basically have the same issue as this question: Embed multiple icons in WPF
Basically, I have the same question as this one below: http://www.mail-archive.com/prototype-scriptaculous@googlegroups.com/msg08682.html In my case,
I asked Almost this exact same question yesterday. Basically I have a css button
Sorry for this question. Basically I have a page where I have it automatically
Basically the same as this question: How to make an infinitely long scroll view
I'm not really sure how to title this question but basically I have an
I most likely have the same problem as in this question: Weird Chrome prototype/jQuery

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.