I want to read token from a text document and check for particular keyword.

Question

0

Asked: June 7, 20262026-06-07T01:29:22+00:00 2026-06-07T01:29:22+00:00

I want to read token from a text document and check for particular keyword.

0

I want to read token from a text document and check for particular keyword. How would I do that?
For example my file looks like this:

<protein id="Q11" name="HUMAN" length="655" crc64="30E1C1D138">
    <match id="G3DSA:3.30.160.60" name="ZC2f_H2/iegse_NA-bd" dbname="GE3D" status="T" evd="HMPfm">
      <ipr id="IPR013087" name="Zinc finger, H2-type/inrase, D-bindg" tpe="Dain" />
      <ln stt="114" end="142" sc="1.0E-8" />
    </match>

(I want to skip the first line and search for token on the second line for dbname must be equal to GE3D. If it is I want to store the stt number and end number.)

*so i did this but I don’t know why it only return me one number for start and for end, since more than one number should be satisfy the requirement:
from lxml import entree

filename = ‘inQ14591.txt’

with open(filename,’rb’) as f:

root = etree.parse(f)
for ln in root.xpath("/protein/match[@dbname='GE3D']/ln"):
    start = ln.get("stt")
    end = ln.get("end")

print (stt)

print end

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T01:29:24+00:00

Seems like you can parse it with BeautifulSoup, but I’m not sure exactly what you’re looking for

from BeautifulSoup import BeautifulSoup
text = '''<protein id="Q11" name="HUMAN" length="655" crc64="30E1C1D138">
    <match id="G3DSA:3.30.160.60" name="ZC2f_H2/iegse_NA-bd" dbname="GE3D" status="T" evd="HMPfm">
      <ipr id="IPR013087" name="Zinc finger, H2-type/inrase, D-bindg" tpe="Dain" />
      <ln stt="114" end="142" sc="1.0E-8" />
    </match>'''

soup= BeautifulSoup(text)

res=soup.findAll(dbname='GE3D')

Update per your comment to find the stt value, you need to find the line with ln then take the tag with stt like so:

stt_value = soup.findAll('ln')[0]['stt'] # u'114'
end_value = soup.findAll('ln')[0]['end'] # u'142'

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to read token from a text document and check for particular keyword.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply