I am using pandas to get the count of the Text type data and to find out the top 5 among the given data.
Input file is as follows:
Gears of war 3
Gears of war
Assassin creed
.......
.......
Crysis 2
Gears of war3
Sims
My Output is as follows:
{
'Gears of War 3': 6,
'Batman': 5,
'gears of war 3': 4,
'Rocksmith': 5,
'nan': 32870
}
I want my code to skip counting nan values in my csv file.
My code is as follows:
data = pandas.read_csv('D:\my_file.csv')
for colname, dtype in data.dtypes.to_dict().iteritems():
if dtype == 'object':
print colname
count = Counter(data[colname])
d = dict((str(k), v) for k, v in count.iteritems())
f = dict(sorted(d.iteritems(), key=lambda item: item[1], reverse = True)[:5])
use
value_counts()to count the non-Nanvalues: