I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral).

Question

0

Asked: May 26, 20262026-05-26T03:20:14+00:00 2026-05-26T03:20:14+00:00

I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral).

0

I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral). A subset of the document words become the features.

Until now, I have programmed a Naive Bayes Classifier using as a feature selector Information gain and chi-square statistics. Now, I would like to see what happens if I use Odds ratio as a feature selector.

My problem is that I don’t know hot to implement Odds-ratio. Should I:

1) Calculate Odds Ratio for every word w, every class:
E.g. for w:

   Prob of word as positive Pw,p = #positive docs with w/#docs
   Prob of word as negative Pw,n = #negative docs with w/#docs
   Prob of word as unknown Pw,u = #unknown docs with w/#docs

   OR(Wi,P) = log( Pw,p*(1-Pw,p) / (Pw,n + Pw,u)*(1-(Pw,n + Pw,u)) ) 
   OR(Wi,N) ...
   OR(Wi,U) ...

2) How should I decide if I choose or not the word as a feature ?

Thanks in advance…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T03:20:15+00:00

Odd ratio is not good measure for feature selection, because it is only shows what happen when feature present, and nothing when it is not. So it will not work for rare features and almost all features are rare so it not work for almost all features. Example feature with 100% confidence that class is positive which present in 0.0001 is useless for classification. Therefore if you still want to use odd ratio add threshold on frequency of feature, like feature present in 5% of cases. But I would recommend better approach – use Chi or info gain metrics which automatically solve those problems.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to classify documents (composed of words) into 3 classes (Positive, Negative, Unknown/Neutral).

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply