I am trying to identify a particular word and then count it. I need to save the count for each identifier.
For example, a document may contain as below:
risk risk risk free interest rate
asterisk risk risk
market risk risk [risk
*I need to count ‘risk’ not asterisk. There could be other risk related words, so don’t stick to the above example. What I need to find is ‘risk’. If risk ends with or starts with anything like < [ ( or . ! * > ] ), etc.. I need to count it as well. But if risk word is a component of a word like asterisk, then I should not count it.
Here is what I have so far. However, it returns a count for asterisk and [risk as well as risk. I tried to use regular expression but keep getting errors. Plus, I am a beginner of Python. If anyone has any idea, please help me!!^^ Thanks.
from collections import defaultdict
word_dict=defaultdict(int)
for line in mylist:
words=line.lower().split() # converted all words to lower case
for word in words:
word_dict[word]+=1
for word in word_dict:
if 'risk' in word:
word, word_dict[word]
1 Answer