My first CSV file looks like this with header included (header is included only at the top not after every entry):
NAME,SURNAME,AGE
Fred,Krueger,Unknown
.... n records
My second file might look like this:
NAME,MIDDLENAME,SURNAME,AGE
Jason,Noname,Scarry,16
.... n records with this header template
The merged file should look like this:
NAME,SURNAME,AGE,MIDDLENAME
Fred,Krueger,Unknown,
Jason,Scarry,16,Noname
....
Basically if headers don’t match, all new header titles (columns) should be added after original header and their values according to that order.
Update
Above CSV were made smaller so I can illustrate what I want to achieve, in reality CSV files are generated one step before this (merge) and can be up to 100 columns
How can I do this?
I’d create a model for the ‘bigger’ format (a simple class with four fields and a collection for instances of this class) and implemented two parsers, one for the first, one for the second model. Create records for all rows of both csv files and implement a writer to output the csv in the correct format. IN brief:
From your comment I see, that you a lot of different csv formats with some common columns.
You could define the model for any row in the various csv files like this:
For parsing any file you would first read the header and add the column headers to a global keystore (will be needed later on for outputting), then create records for all rows, like:
Now the keystore has all used column header names and we can iterate over the collection of all records, get all values for all keys (and get
nullif the file for this record didn’t use the key), assemble the csv lines and write everything to a new file.