I have these really complicated data files that I have processed and as each file is processed I have used an orderedDictionary to capture the keys and values. Each orderedDictionary is appended to a list so my final result is a list of dictionaries. Because of the diversity in the data captured in these files, they have many keys in common but there are enough uncommon keys to make exporting the data to Excel more complicated than I was hoping for because I really need to push out the data in a consistent structure.
Each key has the structure like
Q_#_SUB_A_COLUMN_#_NUMB_#
so for example I have
Q_123_SUB_D_COLUMN_C_NUMB_17
We can translate the key as follows
Question 123
SubItem D
Column C
Instance 17
Because there is a SubItem D, column C and instance 17 there must be a SubItemA, Column B and Instance 16
However, one of the source files might be populated with data values (and keys that range up to the example above and some other source file might terminate with
Q_123_SUB_D_COLUMN_C_NUMB_13
so when I iterate through the list of dictionaries to pull all of the unique key instances so I can use them in csv.dictwriter as the column headings my plan was to sort the resulting list of unique column headings but I can’t seem to make the sort work
specifically I need it to sort so that the results look like
Q_122_SUB_A_COLUMN_C_NUMB_1
Q_122_SUB_B_COLUMN_C_NUMB_1
Q_123_SUB_A_COLUMN_C_NUMB_1
Q_123_SUB_B_COLUMN_C_NUMB_1
Q_123_SUB_C_COLUMN_C_NUMB_1
Q_123_SUB_D_COLUMN_C_NUMB_1
dot
dot
dot
Q_123_SUB_A_COLUMN_C_NUMB_17
Q_123_SUB_B_COLUMN_C_NUMB_17
Q_123_SUB_C_COLUMN_C_NUMB_17
Q_123_SUB_D_COLUMN_C_NUMB_17
The big issue is that I do not know before I open any particular set of these files how many questions are answered, how many sub-questions are answered, how many columns are associated with each question or sub-question or how many instances exist of any particular combination of questions, sub-questions or columns, and I don’t want to. Using Python I was able to reduce over 1,200 lines of SAS code to 95 but this last little bit before I start writing it out to a CSV file I can’t seem to figure out.
Any observations would be appreciated.
My plan is to find all of the unique keys by iterating through the list of dictionaries and then sort these keys correctly so I can then create a csv file using the keys as column headings. I know that I can find the unique keys push that out and manually sort it and then read the sorted file back but that seems clumsy.
Just supply a sufficiently clever function as the key when sorting.