I have a CSV file with many columns. I am trying to sort the rows based on the value in one of the columns (in descending numerical order). And I only wnat it to output the top 10. However, when I use the following code, I get an incorrect output.
import csv
f = open('SNPs.csv', "rU")
reader = csv.reader(f)
output = [row for row in reader]
output.sort(key=lambda x: x[32], reverse=True)
print dict((row[10], (row[11], row[8], row[32])) for row in output[:10])
The output looks something like:
'XRgroup8': ('38', '2', '0.47'), '2': ('30', '13', '0.37'), 'Chromosome': ('Position', 'Distance', 'GC'), 'XRgroup5': ('54', '1', '0.45')
So clearly it isn’t returning 10 values and they aren’t in order. Any ideas?
The first thing you need to know:
Python’s dicts are unordered, and therefor cannot be sorted.
If you need a dict that maintains the order, check out http://docs.python.org/2/library/collections.html#collections.OrderedDict
The second thing:
A dict’s keys are unique.
If you try to add a key that is already in the dict, the value will be overwritten.
This is the most likely reason you’re not getting all the elements you were hoping for.