I have this HTML structure:
<div>
<table>
<tbody>
<tr>
<td>stuff</td>
</tr>
<tr>
<td>
<div>The content I want</div>
</td>
</tr>
</tbody>
</table>
</div>
How do I get “the content I want” and delete all html tags?
Thanks
Use BeautifulSoup, e.g.
Since all the
<tr>tags have some content and you need the data from the second row, you cant just usea.text, but need to do something a bit more complex:Or, if there really is only one
<div>tag on the table rows (<tr>), you can also just traverse the tags e.g:Or you can use the html parser from lxml module as follows: