Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6671017
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T03:20:14+00:00 2026-05-26T03:20:14+00:00

I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral).

  • 0

I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral). A subset of the document words become the features.

Until now, I have programmed a Naive Bayes Classifier using as a feature selector Information gain and chi-square statistics. Now, I would like to see what happens if I use Odds ratio as a feature selector.

My problem is that I don’t know hot to implement Odds-ratio. Should I:

1) Calculate Odds Ratio for every word w, every class:
E.g. for w:

   Prob of word as positive Pw,p = #positive docs with w/#docs
   Prob of word as negative Pw,n = #negative docs with w/#docs
   Prob of word as unknown Pw,u = #unknown docs with w/#docs

   OR(Wi,P) = log( Pw,p*(1-Pw,p) / (Pw,n + Pw,u)*(1-(Pw,n + Pw,u)) ) 
   OR(Wi,N) ...
   OR(Wi,U) ...

2) How should I decide if I choose or not the word as a feature ?

Thanks in advance…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T03:20:15+00:00Added an answer on May 26, 2026 at 3:20 am

    Odd ratio is not good measure for feature selection, because it is only shows what happen when feature present, and nothing when it is not. So it will not work for rare features and almost all features are rare so it not work for almost all features. Example feature with 100% confidence that class is positive which present in 0.0001 is useless for classification. Therefore if you still want to use odd ratio add threshold on frequency of feature, like feature present in 5% of cases. But I would recommend better approach – use Chi or info gain metrics which automatically solve those problems.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to classify words into their parts of speech. Like a verb, a
I want to classify really long lines of texts. Strings. Here is the error
I want to classify a list of string in Python depending on whether they
For example, I want to classify c*t => CLASS1, and d*g => CLASS2: Pattern
I have a huge amount of documents (mainly pdfs and doc's) I want to
I want to classify a new instance using serialized classifier. I found this class
Let's say I have a person table, and I want it to classify all
Is it possible to classify html hex colours into simple string values?? For example,
I want to use model-based clustering to classify 1,225 time series (24 periods each).
Want to know what the stackoverflow community feels about the various free and non-free

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.