I was talking with someone about the Naive Bayes classifier from a mathematics point of view. We were talking about how if you multiply a lot of probabilities together, eventually primitive types like a float or a double would not be able to store the resulting value and it would just get turned into a zero.
P(x1)* P(x2) * … = a number too small for a computer = 0
The person I spoke with said that a “workaround” would be to take the log of the probabilities and just add them together like so.
log( P(x1) ) + log( P(x2) ) + …
I understand the advantage of taking the log of the probability, it increases the magnitude of the number so that it doesn’t “fall off”, but how is it that you can just add them together after that? Is it so that when you run Naive Bayes as long as you do it consistently for all classification “buckets” you can still find the greatest one at the end of the day?
Any explanations are appreciated. Thanks,
mj
Because
log(a*b) = log(a) + log(b). It’s a property of logarithms.