I’m trying to split a 2D array into a specific format and can’t figure out the last step. A sample of my data is structured as follows:
# Original Data
fileListCode = [['Seq3.xls', 'B08524_057'],
['Seq3.xls', 'B08524_053'],
['Seq3.xls', 'B08524_054'],
['Seq98.xls', 'B25034_001'],
['Seq98.xls', 'D25034_002'],
['Seq98.xls', 'B25034_003']]
I am trying to split it up so that it looks like this:
# split into [['Seq3.xls', {'B08524_057':1,'B08524_053':2, 'B08524_054':3},
# ['Seq98.xls',{'B25034_001':1,'D25034_002':2, 'B25034_003':3}]
The dictionary keys 1,2,3 are based on the original position of the entry, starting from the first time that the filename appears. To do this, I’ve first made an array to get all the unique file names (anything that is .xls is a filename)
tmpFileList = []
tmpCodeList = []
arrayListDict = []
# store unique filelist in a tempprary array:
for i in range( len(fileListCode)):
if fileListCode[i][0] not in tmpFileList:
tmpFileList.append( fileListCode[i][0] )
However, I’m struggling with the next step. I can’t figure out a good way of pulling out the codenames (B08524_052 for example), and converting them into a dictionary with an index based on their position.
# make array to store filelist, and codes with dictionary values
for i in range( len(tmpFileList)):
arrayListDict.append([tmpFileList[i], {}])
This code just produces [['Seq3.xls', {}], ['Seq98.xls', {}]] ; I’m not sure whether I should first produce the structure and then try and add the code and dictionary values in, or whether there is a better way.
—
EDIT: I just made sample a little more clear by changing the values in fileListCode
With, itertools.groupby this process will be much simplier:
For old Python versions:
But I think using dict would be better: