I’m sure this is a newbie question so sorry in advance. I’ve read a few books on basic python usage and am now working on the tutorials in ‘programming collective intelligence’ but I’m getting an error with my block of code:
ValueError: math domain error
I see this error alot on google but I’m not sure why its happening. The program basically takes a dict of dict with movie critics and their reviews, then the first function(sim_pearson) tells you how similar two critics are. This part works on its own. The problem is when I’m trying to compare a single user against everyone else so critics can be ranked.
Here’s the code:
# A dictionary of movie critics and their ratings of a small
# set of movies
critics={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5,
'The Night Listener': 3.0},
'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 3.5},
'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
'Superman Returns': 3.5, 'The Night Listener': 4.0},
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
'The Night Listener': 4.5, 'Superman Returns': 4.0,
'You, Me and Dupree': 2.5},
'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0,
'You, Me and Dupree': 2.0},
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0,
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
from math import sqrt
#now lets use the pearson correlation score to see if we can get better results
#Returns the pearson correlation coefficient for p1 and p2
def sim_pearson(prefs, p1, p2):
#get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]:
si[item]=1
#find the number of elements
n=len(si)
# if they have no rating in common, return 0
if n == 0: return 0
#add up all the preferences
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
#sum up the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
#sum up the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
#Calculate Pearson score
num=pSum-(sum1*sum2/n)
den=sqrt((sum1Sq-pow(sum1,2)/n)* sum2Sq-pow(sum2,2)/n)
if den == 0: return 0
r=num/den
return r
#lets check out the results
print 'here are the results from the Pearson algo:'
print sim_pearson(critics, 'Lisa Rose', 'Gene Seymour')
print sim_pearson(critics, 'Mick LaSalle', 'Jack Matthews')
#now lets rank critics to see who is the most simlair
#Returns the best matches for person from prefs dictionary
#Number of results and similarity functions are optional params.
def topMatches(prefs, person, n=5, similarity=sim_pearson):
scores=[(similarity(prefs, person, other),other)
for other in prefs if other != person]
#sort the list so the highest scores appear at the top
scores.sort()
scores.reverse()
return scores[0:n]
#lets see the ranking
print 'here are the top matches:'
print topMatches(critics, 'Toby', n=3)
The traceback says there are problems with 3 lines:
Traceback (most recent call last):
print topMatches(critics, 'Toby', n=3)
for other in prefs if other != person]
den=sqrt((sum1Sq-pow(sum1,2)/n)* sum2Sq-pow(sum2,2)/n)
I’m not sure what’s going on here. Here is the expected answer(from the book):
[(0.9912, ‘Lisa Rose’)], (0.9244, ‘Mick LaSalle’), (0.8934, ‘Claudia
Puig’)]
Thanks in advance
(full disclosure: I suck at math and am learning python so there maybe some major fundamental issue I’m missing but I double checked the code from the book and I don’t think I made a typo or anything).
I think you’re missing a pair of parentheses from this line:
I think it should be
I took your code, made this change, ran it and it gave the expected answers.