Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4345276
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T12:00:50+00:00 2026-05-21T12:00:50+00:00

Greetings to the stackoverflow community, I am currently following a bioinformatics module as part

  • 0

Greetings to the stackoverflow community,

I am currently following a bioinformatics module as part of a biomedical degree (I am basically a python newbie) and the following task is required as part of a Python programming assignment:

extract motif sequences (amino acid sequences, so basically strings in programmatic-speak, that have been excised from algorithms implementing a multiple sequence alignment and subsequently iterative database scanning to generate the best conserved sequences. The ultimate idea is to infer functional significance from such “motifs”).

These motifs are stored on a public database in files which have multiple data fields corresponding to each protein (uniprot ID, Accession Number, the alignment itself stored in a hyperlink .seq file), currently one of which is of interest in this scope. The data field is called “extracted motif sets”.

My question is how to go about writing a script that will essentially parse the “motif strings” and output them to a file. I have now coded the script so that it looks as follows (I don’t write the results to files yet):

import os, re, sys, string 

printsdb = open('/users/spyros/folder1/python/PRINTSmotifs/prints41_1.kdat', 'r')

protname = None  
final_motifs = []

for line in printsdb.readlines():
 if line.startswith('gc;'):
        protname = line.lstrip()    
        #string.lower(name)  # convert to lowercase
        break

def extract_final_motifs(protname):

"""Extracts the sequences of the 'final motifs sets' for a PRINTS entry.
Sequences are on lines starting 'fd;' A simple regex is used for retrieval"""

for line in printsdb.readlines():
        if line.startswith('fd;'):
                final_motifs = re.compile('^\s+([A-Z]+)\s+<')
                final_motifs = final_motifs.match(line)
                #print(final_motifs.groups()[0])
                motif_dict = {protname : final_motifs}
                break 
return 

motif_dict = extract_final_motifs('ADENOSINER')
print(motif_dict)  

The problem now is that while my code loops over a raw database file (prints41_!.kdat) instead of connecting to the public database using urllib module, as suggested by Simon Cockell below, the ouput of the script is simply “none” on the python shell, whereas it should be creating a list such as [AAYIGIEVLI, AAYIGIEVLI, AAYIGIEVLI, etc..]

Does anybody have any idea where the logic error is? Any input appreciated!!
I apologize for the extensive text, I just hope to be a clear as possible. Thanks in advance for any help!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T12:00:51+00:00Added an answer on May 21, 2026 at 12:00 pm

    First of what you are doing is almost right but you have to change "extracted motif sets" lien 2 to a variable say line . What the for loop will do is to return data form the file line by line as the variable after for this case line. And now comes the question how the lysozyme.seq file is formated. its sounds like that none of the data fields have any spacing. Then that means you might get away whit doing line.split(" ") or line.split("\t") \t meas tab. the split will do what it says it dose split the string every time it sees a " " or "\t" depending on what you write in the program.

    Saning directory to find the files shouldn’t be to hard theres probably some questions here about it.

    If you post data or a part of in form one of the files so we can have a look at it the we might be able to help you pars it :).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Greetings to all! This is my first question here on stackoverflow. I have a
Greetings, currently I am refactoring one of my programs, and I found an interesting
Greetings, I'm trying to find a way to 'unbind' a socket from a particular
Greetings. I'm looking for a way to parse a number of XML files in
Greetings! I'm working on wrapping my head around LINQ. If I had some XML
Greetings, I'm trying to find either a free .NET library or a command-line executable
Greetings, I need a way (either via C# or in a .bat file) to
Greetings all, I'm trying to localize a .NET/C# project. I'm using string resource files
Greetings! I am trying to check directory write-permissions from within a Windows MFC/ATL program
Greetings, The VBA code below will create an Excel QueryTable object and display it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.