I want to read token from a text document and check for particular keyword. How would I do that?
For example my file looks like this:
<protein id="Q11" name="HUMAN" length="655" crc64="30E1C1D138">
<match id="G3DSA:3.30.160.60" name="ZC2f_H2/iegse_NA-bd" dbname="GE3D" status="T" evd="HMPfm">
<ipr id="IPR013087" name="Zinc finger, H2-type/inrase, D-bindg" tpe="Dain" />
<ln stt="114" end="142" sc="1.0E-8" />
</match>
(I want to skip the first line and search for token on the second line for dbname must be equal to GE3D. If it is I want to store the stt number and end number.)
*so i did this but I don’t know why it only return me one number for start and for end, since more than one number should be satisfy the requirement:
from lxml import entree
filename = ‘inQ14591.txt’
with open(filename,’rb’) as f:
root = etree.parse(f)
for ln in root.xpath("/protein/match[@dbname='GE3D']/ln"):
start = ln.get("stt")
end = ln.get("end")
print (stt)
print end
Seems like you can parse it with
BeautifulSoup, but I’m not sure exactly what you’re looking forUpdate per your comment to find the
sttvalue, you need to find the line withlnthen take the tag withsttlike so: