I am a newbie trying to teach myself Python. I have a file with a bunch of numbers in it and I want to “import” them into a python list as integers (or at least that’s what I think I want to do). I seem to be having a problem, but I don’t understand what it is. Here is some detailed information on my problem and the code I’ve tried:
I have a DNA sequence (e.g. a string of ~ 150,000 letters) and I would like to have python go to a certain position in that string and then print the 150 letters to the left of that position, the letter at that position surrounded by square brackets, and then 150 letters to the right of that position. I need to do this for >100 positions in the string. I have a list of these positions in a separate file. I have figured out that Biopython has an object that will handle the very long string for me, and if I tell python what position I want (e.g. assign it by hand), I can slice this string and get the correct output. Now I want to be able to import my target positions from this other file and then have python iteratively go through that list and print the output to another file. The first part is where I am having some trouble.
I have tried the input file in several different formats. One like this:
500,1000,15000
And another one like this (all positions on separate lines):
500
1000
15000
Based on some other posts I read I have tried several things. Here is one:
from Bio import SeqIO
import csv
with open('Results.fa', 'a') as f1:
Reference = SeqIO.read("GEO5FinalAssembly2SC.fa", "fasta") # Biopython
DataFile = open('TestFile.csv', 'r')
DataReader = csv.reader(DataFile)
SNP = []
for row in DataReader:
SNP.append(row)
for i in SNP:
IA=i-151 #Creating the intervals
IB=i-1
JA=i+1
JB=i+151
Fragment = Reference.seq[IA:IB] + "[" + Reference.seq[i] + "]" + Reference.seq[JA:JB]
F = str(Fragment) #Need to turn Fragment into a string that can be written
header = ">MINT_SNP" + str(i) + "\n"
f1.write(header)
f1.write(F)
f1.write("\n")
This returns the error:
Traceback (most recent call last):
File "./ReferenceSplitter3.py", line 15, in <module>
IA=i-151 #Creating the intervals
TypeError: unsupported operand type(s) for -: 'list' and 'int'
I also tried this:
from Bio import SeqIO
import csv
with open('Results.fa', 'a') as f1:
Reference = SeqIO.read("GEO5FinalAssembly2SC.fa", "fasta")
with open('TestFile.txt', 'r') as Input:
rows = csv.reader(Input, quoting=csv.QUOTE_NONNUMERIC)
SNP = [[item for number, item in enumerate(row)] for row in rows]
for i in SNP:
IA=i-151 #Creating the intervals
IB=i-1
JA=i+1
JB=i+151
Fragment = Reference.seq[IA:IB] + "[" + Reference.seq[i] + "]" + Reference.seq[JA:JB]
F = str(Fragment) #Need to turn Fragment into a string that can be written
header = ">SNP" + str(i) + "\n"
f1.write(header)
f1.write(F)
f1.write("\n")
This gives a similar error:
Traceback (most recent call last):
File "./ReferenceSplitter4.py", line 13, in <module>
IA=i-151 #Creating the intervals
TypeError: unsupported operand type(s) for -: 'list' and 'int'
However, when I define a list of integers myself like this SNP = (500,1000,1500) it seems to work just fine. I wonder if I am missing some fundamental python concept here. Sorry if this is a really basic question, but any suggestions would be much appreciated!
For the input where they are all on the same line separated by commas (
500,1000,10000), you can read it in with:For the input where they are each on a different line, do:
Either will set
SNPto a list of numbers, like[500, 1000, 15000], than you can then iterate over.