I am a Python beginner who is trying to get multiple random lines for a given category. The original file has three columns, but what I am interested is just one of those categories. The file(csv) looks like this:
No,Size,Name
10,1346,Cat
24,423,Dog
289,590,Cat
12,302,Dog
351,33,Cat
51,812,Dog
91,778,Cat
1193,465,Cat
44,178,Dog
None of the lines are identical and I want to get random 3 lines for each ‘Name’. This is what I have so far:
import random
with open('C:\Users\Owl\file.csv') as f:
lines = f.readlines()[1:] #Skip heading
for line in lines:
try:
name = line[2]
except:
continue
for name in lines:
for lines in random.sample(lines,3):
print lines
f.close()
But I get something like this:
12,302,Dog
1193,465,Cat
10,1346,Cat
2
3
D
instead of something like this:
1193,465,Cat
10,1346,Cat
91,778,Cat
51,812,Dog
44,178,Dog
12,302,Dog
In the output I get now, I am not getting lines by ‘Name’ and somehow just letters/numbers after that. Then, I get “ValueError: sample larger than population” and terminates (actual file is much larger than the example here).
Also, if possible, is there an easy way to sort by “Name” in the output?
I have been struggling with this for hours looking it up on the Internet but have not been able to solve… Could anybody please help me? Thank you all!
You can do this much more easily by using
itertools.groupby()and thecsvmodule. We first make acsv.DictReaderto give us easy access to the values, we then sort and group the list by the"Name"column, and then select the values.Which gives us:
If you wish to make the dictionaries lists, it’s easy to do with a simple list comprehension: