How can I fetch genomic sequence efficiently using Python? For example, from a .fa file or some other easily obtained format? I basically want an interface fetch_seq(chrom, strand, start, end) which will return the sequence [start, end] on the given chromosome on the specified strand.
Analogously, is there a programmatic python interface for getting phastCons scores?
thanks.
See my answer to your question over at Biostar:
http://biostar.stackexchange.com/questions/1639/getting-genomic-sequences-and-phastcons-scores-using-python-from-ensembl-ucsc
Use SeqIO with Fasta files and you’ll get back record objects for each item in the file. Then you can do:
to pull out slices. The nice thing about using a standard library is you don’t have to worry about the line breaks in the original fasta file.