I am attempting to create a bar plot of a large text file of data that looks like the following, storedd in a csv file:
#DowntonPBS, 23
#DowntonAbbey, 12
#Download, 8
#Download:, 2
#Downloads, 2
#DownstairsMixtape, 1
#DownWithAssad, 1
#DownYoTLParty, 1
#DowntonAbbey?, 1
#Downtonabbey, 1
#DowntownAbbey, 1
The following code is where I’m at, and while this method has worked in the past for different plotting scripts, I’ve done something wrong here that I just can’t seem to find. Instead of plotting all of the data, I only seem to be getting three records.
import pylab as p
import sys
from matplotlib.mlab import csv2rec
y = []
fig = p.figure()
ax = fig.add_subplot(1,1,1)
input = open(sys.argv[1], 'r')
data = csv2rec(input, names=['tag', 'count'])
for item in data['count']:
y.append(item)
N = len(y)
ind = range(N)
ax.bar(ind, y, align='center')
ax.set_ylabel('Counts')
ax.set_title('HashTag Diversity')
ax.set_xticks(ind)
group_labels = data['tag']
ax.set_xticklabels(group_labels)
fig.autofmt_xdate()
p.show()
If I add print statements for y and N, and run the script against my larger dataset I end up with:
[45, 37, 36]
3
These values should be a very large array “1000” values and the lenth (N) should = 1000. I’m not sure what’s going on here.
csv2rec()will ignore lines startswith “#” by default, you can change this by: