I have a python dict that looks like:
defaultdict(<type 'int'>, {u'RT': 1, u'be': 1, u'uniforms': 1, u'@ProFootballWkly:': 1, u'in': 1, u'Nike': 1, u'Brooklyn.': 1, u'ET': 1, u"NFL's": 1, u'will': 1, u'a.m.': 1, u'at': 1, u'unveiled': 1, u'Jimmy': 3, u'11': 1, u'new': 1, u'The': 2, u'today': 1})
I’m processing it with:
freq_distribution = nltk.FreqDist(filtered_words)
top_words = freq_distribution.keys()[:4]
print top_words
This outputs the top 4 words which includes the word “The” I am trying to incorporate removal of Dolch “commonly used” words before this process happens with:
filtered_words = [w for w in word_count \
if not w in stopwords.words('english')]
The problem is that I still end up with the word “The” because all of the (stopwords) from NLTK are lowercase. I need a way to take the input of word_count and switch it to lower case. I have tried adding lower() in various areas such as:
freq_distribution = nltk.FreqDist(word_count.lower())
But have not had any success, as I repeatedly get the following error:
AttributeError: 'list' object has no attribute 'lower'
This lowercases
wbefore checking whether it’s in the stopwords list. So ifwis “The”, it will be transformed tothebefore checking. Since “the” is in the list, it will get filtered out.