Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8867825
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T17:08:03+00:00 2026-06-14T17:08:03+00:00

Hey I am trying to use a Naive Bayes classifier to classify some text.

  • 0

Hey I am trying to use a Naive Bayes classifier to classify some text. I am using NLTK. Whenever I test the classifier using the classify() method it always returns the correct classification for the first item, and the same classification for every other line of text I classify. The following is my code:

from nltk.corpus import movie_reviews
from nltk.tokenize import word_tokenize
import nltk
import random
import nltk.data

documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)

all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = all_words.keys()[:2000] 

def bag_of_words(words):
    return dict([word,True] for word in words)

def document_features(document): 
    document_words = set(document) 
    features = {}
    for word in word_features:
        features['contains(%s)' % word] = (word in document_words)
    return features

featuresets = [(document_features(d), c) for (d,c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]
classifier = nltk.NaiveBayesClassifier.train(train_set)

text1="i love this city"
text2="i hate this city"


feats1=bag_of_words(word_tokenize(text1))
feats2=bag_of_words(word_tokenize(text2))


print classifier.classify(feats1)
print classifier.classify(feats2)

This code will print pos twice where as if I flipped the last 2 lines of the code it will print neg twice. Can anyone help?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T17:08:04+00:00Added an answer on June 14, 2026 at 5:08 pm

    Change

    features['contains(%s)' % word] = (word in document_words)
    

    to

    features[word] = (word in document)
    

    Otherwise the classifier only knows about “words” of the form “contains(…)”, and is therefore clueless about the words in "i love this city"


    import nltk.tokenize as tokenize
    import nltk
    import random
    random.seed(3)
    
    def bag_of_words(words):
        return dict([word, True] for word in words)
    
    def document_features(document): 
        features = {}
        for word in word_features:
            features[word] = (word in document)
            # features['contains(%s)' % word] = (word in document_words)
        return features
    
    movie_reviews = nltk.corpus.movie_reviews
    
    documents = [(set(movie_reviews.words(fileid)), category)
                 for category in movie_reviews.categories()
                 for fileid in movie_reviews.fileids(category)]
    random.shuffle(documents)
    
    all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
    word_features = all_words.keys()[:2000] 
    
    train_set = [(document_features(d), c) for (d, c) in documents[:200]]
    
    classifier = nltk.NaiveBayesClassifier.train(train_set)
    
    classifier.show_most_informative_features()
    for word in ('love', 'hate'):
        # No hope in passing the tests if word is not in word_features
        assert word in word_features
        print('probability {w!r} is positive: {p:.2%}'.format(
            w = word, p = classifier.prob_classify({word : True}).prob('pos')))
    
    tests = ["i love this city",
             "i hate this city"]
    
    for test in tests:
        words = tokenize.word_tokenize(test)
        feats = bag_of_words(words)
        print('{s} => {c}'.format(s = test, c = classifier.classify(feats)))
    

    yields

    Most Informative Features
                       worst = True              neg : pos    =     15.5 : 1.0
                  ridiculous = True              neg : pos    =     11.5 : 1.0
                      batman = True              neg : pos    =      7.6 : 1.0
                       drive = True              neg : pos    =      7.6 : 1.0
                       blame = True              neg : pos    =      7.6 : 1.0
                    terrible = True              neg : pos    =      6.9 : 1.0
                      rarely = True              pos : neg    =      6.4 : 1.0
                     cliches = True              neg : pos    =      6.0 : 1.0
                           $ = True              pos : neg    =      5.9 : 1.0
                   perfectly = True              pos : neg    =      5.5 : 1.0
    probability 'love' is positive: 61.52%
    probability 'hate' is positive: 36.71%
    i love this city => pos
    i hate this city => neg
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hey, I am trying to use Mocha and Rspec to test a scenario where
hey there - I'm trying to use asp.net mvc for some things as usual,
hey m using mvc3 with knockout and trying to use knockout binding to upload
Hey, I am trying to use PowerISO's command line program piso. I am using
Hey, I'm trying to use VTD-XML to parse XML given to it as a
Hey I was trying to call a script to made some changes to an
Hey im trying to use the WIPMania Geolocation database in my C# application but
Hey I'm trying to use a regex to count the number of quotes in
Hey, I'm getting a linker error LNK2019: unresolved external symbol when trying to use
Hey, Delphi Web Script is really great scripting engine. I'm trying to use it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.