Need help merging two dictionaries using the keys from one to look at values in another. If returns true it would append its own values into the other dictionary (updating it.. but not overwriting already present values)
The code (sorry first custom script ever):
otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()
#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
lineArray = re.split('\s+',line)
otuid = lineArray[0]
clusterid = lineArray[3]
if otuid in otuid2clusteridlist:
otuid2clusteridlist[otuid].append(clusterid)
else:
otuid2clusteridlist[otuid] = list()
otuid2clusteridlist[otuid].append(clusterid)
#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
lineArray = re.split('\s+', line)
clusterid = lineArray[4]
denoiseid = lineArray[3]
if clusterid in clusterid2denoiseidlist:
clusterid2denoiseidlist[clusterid].append(denoiseid)
else:
clusterid2denoiseidlist[clusterid] = list()
clusterid2denoiseidlist[clusterid].append(denoiseid)
#print/return function for testing (will convert to write out later)
for key in finallist:
print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]
Block one returns
OTU: 3 has 3 sequence(s) which = ['5PLAS.R2.h_35336', 'GG13_52054', 'GG13_798']
OTU: 5 has 1 sequence(s) which = ['DEX1.h_14175']
OTU: 4 has 1 sequence(s) which = ['PLAS.h_34150']
OTU: 7 has 1 sequence(s) which = ['DEX12.13.h_545']
OTU: 6 has 1 sequence(s) which = ['GG13_45705']
Block two returns
OTU: GG13_45705 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']
So the goal is to add block two’s out put into block one. I would like it to add in like this
...
OTU: 6 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']
I attempted dic.update but it just adds block twos contents into block one since the key is not present in block one.
I think my issue is more complicated, I need block two to look within block one’s value for its key and append values into that list.
I have been trying for loops and .append (similar to the code already wrote) but I am lacking the overall knowledge of python to solve this.
Ideas?
Additions,
Some subsets of the data:
cluster_97.ucm (block one’s file):
5 376 * DEX1.h_14175 DEX1.h_14175
6 294 * GG13_45705 GG13_45705
0 447 98.7 DEX22.h_37221 DEX29.h_4583
1 367 98.9 DEX14.15.h_35477 DEX27.h_779
1 443 98.4 DEX27.h_3794 DEX27.h_779
0 478 97.9 DEX22.h_7519 DEX29.h_4583
denoise.ucm_test (block two’s file):
11 294 * GG13_45705 GG13_45705
11 278 99.6 GG13_6312 GG13_45705
11 285 99.6 GG13_32148 GG13_45705
11 275 99.6 GG13_35246 GG13_45705
I picked these subsets because the 2nd line in file one is what file two would would be updating.
If anyone wants to give it a shot.
Updated to reflect the matching on the values…
I think the solution to your problem can be found in the fact that lists a mutable in Python and variables with mutable values are just references. So we can use a second dictionary mapping the value to the list.
I was not sure if you needed
clusterid2denoiseidlistor not, so I added a newknown_clustersto hold the mapping from values to lists.I’m not sure I covered all the edge cases in your real problem, but this generates the desired output given the supplied test inputs.