I’m new to Python and to programming in general. I’ve installed BioPython in hopes that some of its components can help with a script that I’m working on. That script needs to handle many xread files, which each contain a matrix that I need to slice in several ways. I’m hoping that there already exists a sequence datatype or class (is there a difference?) that allows indexing in the odd ways required by sequences with ambiguous characters coded in formats other than IUPAC. For example, in the sequence.
2-123[01]3-22
The characters in the string literal [01] represent a single ambiguous character, either 0 or 1, in the DNA sequence represented. So the slice [-6:] should return 3[01]3-22. I haven’t been able to find anything on this in the BioPython documentation, though I may have overlooked it. If there is something in BioPython that will do this, could you please point me toward the relevant documentation?
Thanks.
I’m not a BioPython expert, but you could define your own class to work the way you need. You’ll need to parse it first, perhaps using regular expressions. For example:
Testing it:
It’s a list inside…
But behaves like a string!
Maybe you’ll need to define some other methods to get the desired behavior.