This should be a simple problem but I just can’t wrap my head around it. I have a dictionary called TD. The {key1{key2:values}} of TD are {1:{u'word':3, u'next':2, u'the':2},2:{...}...} where key1 is a document,key2 is a word in a document and value is the number of times that word appears in the document, obtained using the Counter method.
I have a large number of documents so each document has an entry in TD:
TD = {1:{u'word':2, u'next':1, u'the':5,...},
2:{u'my':4, u'you':1, u'other':2,...},
...
168:{u'word':1, u'person':1, u'and':8,...}}
What I now want to do is check each word in {1{...}} to see if it appears in the other documents and repeat this process for each document. For each document a word appears in, freq is increased by 1. I have a new dictionary called Score that should look like this:
{1:{u'word':score, u'next':score,...}, 2:{u'my':score, u'you':score,...}...}
To obtain this dictionary:
Score={}
count = 0
for x,i in TD[count].iteritems():
freq=1
num=1
for y in TD[num].keys():
if word in TF[num].keys():
freq+=1
num+=1
Score[num]={x:(i*freq)}
num+=1
This is giving me the following output:
{1:{u'word':score}, 2:{u'next':score}, 3:{u'the':score}...}
should be:
{1:{u'word':score, u'next':score, u'the':score,...}...}
I think the problem is with the line Score[num]={x:(i*freq)}
Use dict views to find the intersection between documents, then a Counter to count their frequencies:
Each entry in Score will be a count of how often each word in a document appears in the other documents.
If you need to include the word count in the current document as well (count + 1), simply remove the
if otherid == idtest.In your own code you confused
numandcount, but in python you usually don’t need to manually increment a loop counter in any case.