I am attempting to parse html data from a website using BeautifulSoup for python.

Question

0

Asked: May 23, 20262026-05-23T10:15:51+00:00 2026-05-23T10:15:51+00:00

I am attempting to parse html data from a website using BeautifulSoup for python.

0

I am attempting to parse html data from a website using BeautifulSoup for python. However, urllib2 or mechanize is not able to read the whole html format. The returned data is

<html>
<head>
    <title>
    EC 4.1.2.13 - Fructose-bisphosphate aldolase    </title>
    <meta name="description" content="Information on EC 4.1.2.13 - Fructose-bisphosphate aldolase">
    <meta name="keywords" content="EC,Number,Enzyme,Pathway,Reaction,Organism,Substrate,Cofactor,Inhibitor,Compound,KM Value,KI Value,IC50 Value,pi Value,Turnover Number,pH,Temperature,Optimum,Range,Source Tissue,BLAST,Subunits,Modification,Crystallization,Stability,Purification">
</head>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<frameset cols="190,*" border="0">
    <frame name="navigation" src="flat_navigation.php4?ecno=4.1.2.13&organism_list=Mycobacterium tuberculosis&Suchword=&UniProtAcc=P67475" frameborder="no">
    <frameset rows="110,*" border="0">
            <frame name="header" src="flat_head.php4?ecno=4.1.2.13" frameborder="no">

        <frame name="flat" src="flat_result.php4?ecno=4.1.2.13&organism_list=Mycobacterium tuberculosis&Suchword=&UniProtAcc=P67475" frameborder="no">

    </frameset>
</frameset>
<noframes>
<body>
<h1>EC 4.1.2.13 - Fructose-bisphosphate aldolase </h1>

<a href="flat_result.php4?ecno=4.1.2.13&organism_list=Mycobacterium tuberculosis&Suchword=&UniProtAcc=P67475">More detailed information on the enzyme EC 4.1.2.13 - Fructose-bisphosphate aldolase</a>

Sorry, but your browser doesn't support frames. Please use another browser!
</body>
</noframes>
</html>

When I manually open the webste using Internet Explorer the whole html can be read. Is there anyway using urllib2, mechanize, or BeautifulSoup to work around this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T10:15:51+00:00

Editorial Team

2026-05-23T10:15:51+00:00Added an answer on May 23, 2026 at 10:15 am

That’s because the content is in the frames. You can either parse the page and look for the src attribute of the main <frame> element or directly request the frame. In most browsers, you can right-click and select “Frame Properties” or so to get the frame’s URL.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am attempting to parse html data from a website using BeautifulSoup for python.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply