How can i merge multiple CSV files in perl?
For example I have file 1 Packet1.csv looking like:
#type, number, info, availability
computer, t.100, pentium 2, yes
computer, t.1000, pentium 3, yes
computer, t.2000, pentium 4, no
computer, t.3000, pentium 5, yes
and file 2 Packet2.csv looking like:
#type, number, info, availability
computer, t.100, pentium 2, yes
computer, t.1000, pentium 3, no
computer, t.2000, pentium 4, no
computer, t.4000, pentium 6, no
and the output i desire is a single file where the number of Packets is not fixed :
#type, number, info, **Packet1** availability, **Packet2** availability
computer, t.100, pentium 2, yes, yes
computer, t.1000, pentium 3, yes, no
computer, t.2000, pentium 4, no, no
computer, t.3000, pentium 5, yes
computer, t.4000, pentium 6, no
Going back to your attempt at multidimensional hashing: Hash of hashes perl, you will need to change the data structure you are using in order to store multiple entries of a particular element.
CSVs can be intuitively read in to a hash with 2 levels. The rows of the csv can be hashed by their IDs (in this case I guess the IDs are the numbers ‘t.100’, ‘t.1000’ etc) and the values of each row can be stored in the second level hash using the header strings as its keys. It will look something like this if you viewed the structure with Data::Dumper:
Whether ‘number’ is also a key for each ‘row hash’ is up to you depending on how useful that might be (usually you already know the key for the row in order to access it).
This data structure would be fine in order to store one CSV file. However we need to add an extra layer of complexity in order to cope with merging multiple CSVs in the way that you describe. For example, to keep track of the files that a particular ID appears in, we can store a third hash as the value of the ‘availability’ key, since that is the value that is changing between entries of the same ‘number’:
Once all files have been read into this structure, printing the final CSV out is then a process of looping over the keys of the outer hash and, for each row, ‘joining’ the row’s keys in the correct order. The ‘Packet’ hash can also be looped over to retrieve all ‘availability’ values and these can be appended to the end of each row.
I hope that helps you understand one possible way of dealing with this kind of data. You can ask about specific parts of the implementation if you are finding them difficult and I will be happy elaborate.