I wrote a code to perform some simple csv formatting but I know it’s not as good as it could be.
Here’s the input
1,a
1,b
1,c
2,d
2,e
3,a
3,d
3,e
3,f
Here’s the output I want
['1','a','b','c']
['2','d','e']
['3','a','d','e','f']
This is the code I wrote
import csv
input = csv.reader(open('book1.csv'))
output = open('output.csv', 'w')
job=[0,0]
for row in input:
if row[0] == job[1]:
job.append(row[1])
else:
print(job)
#output.write(",".join(job))
job[1] = row[0]
job = [job[0], job[1]]
job.append(row[1])
This is the output
[0,0]
[0, '1', 'a', 'b', 'c']
[0, '2', 'd', 'e']
The questions I have are as follows
How can I finish the else statement for the line? Also how can I get away with adding 0 as the zeroth element in the set. I also would like the code to output the last “job” set. Lastly does anyone have any suggestions for improving this code?
I ask because I would like to get much better at writing code, instead of just hacking it together. Any responses would be greatly appreciated!
Thanks in advance
What you’re trying to do is group the second column by the first column. Python has a tool for that,
itertools.groupby:is an iterator yielding
(key, group)tuples, where thekeyis the first item in the rows, and eachgroupis an iterator of lines in the group.operator.itemgetterdoes the same thing as the[]syntax — gets the item specified.operator.itemgetter(0)is the same as:To extract the values and create lists, you can:
which starts each list with the
keyand then extracts the second item from each line and adds them to the list.For your example input, the output will be: