I want to perform multiple edits to most rows in a csv file without making multiple writes to the output csv file.
I have a csv that I need to convert and clean up into specific format for another program to use. For example, I’d like to:
- remove all blank rows
- remove all rows where the value of column “B” is not a number
- with this new data, create a new column and explode the first part of the values in column B into the new column
Here’s an example of the data:
"A","B","C","D","E"
"apple","blah","1","","0.00"
"ape","12_fun","53","25","1.00"
"aloe","15_001","51","28",2.00"
I can figure out the logic behind each process, but what I can’t figure out is how to perform each process without reading and writing to a file each time. I’m using the CSV module. Is there a better way to perform these steps at once before writing a final CSV?
I would define a set of tests and a set of processes.
If all tests pass, all processes are applied, and the final result is written to output: