currently i’m working on an implementation of Logistic regression. Nothing really complex, just working with a simple dataset (Andrew Ng’s house buying prediction). Here is what i’m doing:
My Cost function:
def Cost(theta, X, Y):
m = Y.size
h = Sigmoid(X.dot(theta.T))
J = (1.0/m) * ((-Y.T.dot(log(h))) - ((1.0 - Y.T).dot(log(1.0-h))))
return J.sum()
Invoking fmin:
initial_theta = zeros(shape = (X.shape[1],1))
theta = fmin(Cost2,initial_theta, args = (X,Y))
When using fmin, the final theta I get is way too big for predictions. When predicting, I always get values arround 0,62 and 0,71, which will always predict true. Maybe with more iteractions, I could get a better result, but I’m not sure about it.
When using fmin_bfgs, the cost if converging to NaN, making it unusable.
There is some other data:
Final theta:
[ 0.00126059 0.01033406]
Final Cost:
[ 0.62079972]
Predictions:
[ 0.63422573 0.6727308 0.62957501 0.66757524 0.64503653 0.62245727
0.67765315 0.68966732 0.72525886 0.73487524 0.67716454 0.70974059
0.7142225 0.70415933 0.62892863 0.69232142 0.70645758 0.64152605
0.62052863 0.69538731]
Real Ratings (If 1, the prediction should be >=.5 If 0, prediction should be <0.5). This is what I should’ve been receiving:
[0 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 1 1 0 1]
Any ideas on how to make it better?
So, after some researching and testing, I found the reason why my code was not working.
Since fmin_bfgs was converting to NaN, I decided to take a look why, and what I could do to resolve this. What I did IS NOT the best way, but solved the issue and now my code works.
So, basically, fmin_bfgs was generating numbers way too small, that was causing an overflow, resulting in NaN. What I did was (once again, not the ideal way to solve, but it did the trick):
First: Split the cost function in three parts:
So, as you can see, the code
was replaced by the factors a, b and h, where h is the sigmoid function applied to the vector.
After some testing, I found that the issue was with the b term. The log calculation was generating -infinity, since h was calculated as 1 to every term, generating an log(0.0), what, for the ones of you that know some basic math, is -infinity. So, this is what I did to solve the problem:
Basically, what I thought was: “Well, I’m receiving a -infinity here. It is a really small number, but is causing an overflow. So, lets replace it with a really small number that will not cause an overflow!”
One more time, probably not the best approach, but it did the trick for me.
After this, my code ran smoothly, and with really good results, actually.