I have a CSV file with the following format:
ID | STUFF | Custom | Custom Value
1 | string1 | name1 | val1
1 | string1 | name2 | val2
1 | string1 | name3 | val3
2 | string2 | name1 | val4
2 | string2 | name3 | val5
3 | string3 | name2 | val6
etc…
The import part about the CSV is that the current Custom Column has various “Fields” in it that I need moved out to it’s own column and paired with it’s value in the next column. The Custom column contains somewhat unknown values. each ID, for example, may have a different subset of Custom “names”. I do, however, know the complete set of possible “Custom” names available.
Desired output: (NOTE: I realized I goofed on what I needed for the output, so now it’s corrected)
ID | STUFF | name1 | name2 | name3
1 | SomeText | name1_Value | name2_Value| name3_Value
2 | SomeText | name1_Value | name2_Value| name3_Value
I am relatively new at Python and am having trouble seeing an elegant way of doing this without a serious amt of iterations/looping. I figured that using the CSV module and DictReader with tuples will probably end up being the right way of going about this, but I’m strugging with it at the moment. I have roughly 1200 rows in this file, and it only needs to work once, but I’d like to learn the best way to do things in python.
You could do something like this (assumes the rows in the csv are sorted by id):
itertools.groupbyis really handy in situations like this.Then
resultswill be a list of dictionaries, which you can write out as csv using something like this:Replace
<no value>with whatever the value should be when there was no row with that custom name.Edit: actually, the output I’ve given isn’t quite what you asked for (although I think it might be more useful). To get exactly what you asked for, you’d change the second part to be: