I have these really complicated data files that I have processed and as each

Question

0

Asked: May 26, 20262026-05-26T04:43:20+00:00 2026-05-26T04:43:20+00:00

I have these really complicated data files that I have processed and as each

0

I have these really complicated data files that I have processed and as each file is processed I have used an orderedDictionary to capture the keys and values. Each orderedDictionary is appended to a list so my final result is a list of dictionaries. Because of the diversity in the data captured in these files, they have many keys in common but there are enough uncommon keys to make exporting the data to Excel more complicated than I was hoping for because I really need to push out the data in a consistent structure.

Each key has the structure like

Q_#_SUB_A_COLUMN_#_NUMB_#

so for example I have

 Q_123_SUB_D_COLUMN_C_NUMB_17

We can translate the key as follows

 Question 123
 SubItem D
 Column C
 Instance 17

Because there is a SubItem D, column C and instance 17 there must be a SubItemA, Column B and Instance 16

However, one of the source files might be populated with data values (and keys that range up to the example above and some other source file might terminate with

Q_123_SUB_D_COLUMN_C_NUMB_13

so when I iterate through the list of dictionaries to pull all of the unique key instances so I can use them in csv.dictwriter as the column headings my plan was to sort the resulting list of unique column headings but I can’t seem to make the sort work

specifically I need it to sort so that the results look like

 Q_122_SUB_A_COLUMN_C_NUMB_1
 Q_122_SUB_B_COLUMN_C_NUMB_1
 Q_123_SUB_A_COLUMN_C_NUMB_1
 Q_123_SUB_B_COLUMN_C_NUMB_1
 Q_123_SUB_C_COLUMN_C_NUMB_1
 Q_123_SUB_D_COLUMN_C_NUMB_1
 dot
 dot
 dot
 Q_123_SUB_A_COLUMN_C_NUMB_17
 Q_123_SUB_B_COLUMN_C_NUMB_17
 Q_123_SUB_C_COLUMN_C_NUMB_17
 Q_123_SUB_D_COLUMN_C_NUMB_17

The big issue is that I do not know before I open any particular set of these files how many questions are answered, how many sub-questions are answered, how many columns are associated with each question or sub-question or how many instances exist of any particular combination of questions, sub-questions or columns, and I don’t want to. Using Python I was able to reduce over 1,200 lines of SAS code to 95 but this last little bit before I start writing it out to a CSV file I can’t seem to figure out.

Any observations would be appreciated.

My plan is to find all of the unique keys by iterating through the list of dictionaries and then sort these keys correctly so I can then create a csv file using the keys as column headings. I know that I can find the unique keys push that out and manually sort it and then read the sorted file back but that seems clumsy.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T04:43:21+00:00

Editorial Team

2026-05-26T04:43:21+00:00Added an answer on May 26, 2026 at 4:43 am

Just supply a sufficiently clever function as the key when sorting.

>>> (lambda x: tuple(y(z) for (y, z) 
                     in zip((int, str, str, int), 
                            x.split('_')[1::2])))('Q_122_SUB_A_COLUMN_C_NUMB_1')
(122, 'A', 'C', 1)

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have these really complicated data files that I have processed and as each

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply