I am trying to implement Naive bayes algorithm on some real time data.I am aware of the rules of bayes but I am not sure how to implement on my data.My data looks like as below.There are total 2 labels in my data which are ok,fraud and testing data labelled as unkn.I need to classify all the unkn records as either ok or fraud by applying Naive Bayes Algorithm.How do I achieve this? Please some one help me.
1,v1,p1,182,1665,unkn
2,v2,p1,3072,8780,ok
3,v3,p1,20393,76990,ok
4,v4,p1,112,1100,fraud
5,v3,p1,6164,20260,unkn
6,v5,p2,104,1155,ok
7,v6,p2,350,5680,unkn
8,v7,p2,200,4010,ok
9,v8,p2,233,2855,unkn
10,v9,p2,118,1175,unkn
Bayes Rules:-
Posterior Probability of unkn being ok = Prior Probability of ok * Likelihood of unkn given ok.
Posterior Probability of unkn being fraud = Prior Probability of fraud * Likelihood of unkn given fraud.
I am assuming the row
1,v1,p1,182,1665,unknis interpreted as:1,v1= some identifiersp1,182,1665= features of your data pointunkn= label, in this case unknownWith that notation in mind, your training data consists of all lines that have label
okorfraud, and your testing data is the rest. You have to calculate a priors and conditional likelihoods:okis the proportion ofokexamples in the training data. The same applies forfraudf, such asv1orp1, its likelihood given ok is the proportion ofokexamples in the training data which contain the feature. For instance,p1is contained in 2 out of 4okexamples, giving you a probability of 0.5.For each example multiply together the probabilities you calculated for all of its features in step 2. Multiply the result by the probability in step 1 to obtain the (joint) probability of your example belonging to a particular class.
Caveats:
182) need to be converted to discrete (e.g. by binning) or you need to come up with some other way of estimating the conditional probability in step 2. Google for continuous-valued Naive Bayes