Finding coding sequence
cds_position = ''
cds_start = 0
cds_end = 0
cds_sequence = ''
for line in data:
cds_temp = ''
if re.findall(r' CDS ',line):
cds_temp = cds_temp + line.replace('\n','')
position = re.search(r'(\d+)\.\.(\d+)',cds_temp)
cds_start = cds_start + int(position.group(1))
cds_end = cds_end + int(position.group(2))
cds_position = str(cds_start)+':'+str(cds_end)
cds_sequence = cds_sequence + sequence[(cds_start-1):(cds_end-1)]
I get this error
Traceback (most recent call last):
File "Upstream_ORF.py", line 357, in <module>
GenBank_Reader(test_file)
File "Upstream_ORF.py", line 317, in GenBank_Reader
cds_start = cds_start + int(position.group(1))
AttributeError: 'NoneType' object has no attribute 'group'
ok I really don’t understand why I am getting this error.
i wrote a script that goes through a file of a particular format line by line and whenever it encounters a particular string followed by 10 spaces, it takes the number values that follow it
exon 1..1333
/gene="BRD2"
/gene_synonym="D6S113E; FSH; FSRG1; NAT; RING3; RNF3"
/inference="alignment:Splign:1.39.8"
/number=3
STS 350..463
/gene="BRD2"
/gene_synonym="D6S113E; FSH; FSRG1; NAT; RING3; RNF3"
/standard_name="CGCb278"
/db_xref="UniSTS:240930"
so whenever it finds the word exon followed by 10 spaces it takes the numberes flanking the ‘..’
it worked for 5 different files but for one of them it just isn’t working and it is the exact same format. i’m not sure why its working now because it still works with the other ones. i found all the occurences it says ‘exon’ in the file and none of them were flanked by 10 spaces like the one i was looking for.
why would this error come up when it works for other files with the same format ?
If
re.searchreturnsNone, that means that it failed to find a match. The file in question must have something different about it which causes the expression to fail.Couple of little comments about your code:
if re.findall(r' CDS ',line):is unnecessary. Just doif ' CDS ' in line:, which does a substring search.line.replace('\n','')you should useline.rstrip('\n'), as that is more typical.