I’m a python newbie but have programmed a while in other languages. I have a long string of DNA (lower case) and AA sequences (upper case). Further at the start of the file I have a protein name all in upper case. Thus my file looks like this.
PROTEINNAMEatcgatcg… JFENVKDFDFLK
I need to find the first non-uppercase letter in the string so I can then cut out the protein name. Thus, what I would want from the above is:
atcgatcg… JFENVKDFDFLK
I can do this with a loop but that seems like overkill and inefficient. Is there a simply python way to do it?
I can get all the uppercase letters using re.findall(“[A-Z]”,mystring) but then I would need to do a comparison to see where the result differs from the original string.
Thanks!
You are almost there with your regex… but there are other methods besides findall:
http://docs.python.org/library/re.html#re.sub
Not sure about performance, but you could also do
And there you have it:
Second one looks to be ~3 times faster.