I’m trying to match the TH tag in the below HTML (file.txt):
<TABLE WIDTH="71%" BORDER=0 CELLSPACING=0 CELLPADDING=0>
<TR VALIGN="BOTTOM">
<TH WIDTH="34%" ALIGN="LEFT"><FONT SIZE=1><B>Name<BR> </B></FONT><HR NOSHADE></TH>
<TH WIDTH="3%"><FONT SIZE=1> </FONT></TH>
<TH WIDTH="5%" ALIGN="CENTER"><FONT SIZE=1><B>Age</B></FONT><HR NOSHADE></TH>
<TH WIDTH="3%"><FONT SIZE=1> </FONT></TH>
<TH WIDTH="55%" ALIGN="CENTER"><FONT SIZE=1><B>Positions</B></FONT><HR NOSHADE></TH>
</TR>
<TR BGCOLOR="#CCEEFF" VALIGN="TOP">
<TD WIDTH="34%"><FONT SIZE=2>Stephen A. Wynn</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="5%" ALIGN="CENTER"><FONT SIZE=2>60</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="55%"><FONT SIZE=2>Chairman of the Board and Chief Executive Officer</FONT></TD>
</TR>
<TR BGCOLOR="White" VALIGN="TOP">
<TD WIDTH="34%"><FONT SIZE=2>Kazuo Okada</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="5%" ALIGN="CENTER"><FONT SIZE=2>60</FONT></TD>
<TD WIDTH="3%"><FONT SIZE=2> </FONT></TD>
<TD WIDTH="55%"><FONT SIZE=2>Vice Chairman of the Board</FONT></TD>
</TR>
</TABLE>
I have tried the following, but it doesn’t seem to work:
from bs4 import BeautifulSoup
infile = open("file.txt")
soup = BeautifulSoup(infile.read())
#this works
soup.findAll('th')
#this works but isn't particularly useful...
soup.findAll(text="Age")
#this is what I really want, but it returns an empty list
soup.findAll('th', text="Age")
Thanks for the help!
As far as I can tell, you want to get the th object which has the text “Age”. There are many ways to skin that cat, basically starting at finding all the th’s. From there you can iterate over all of them to find the one that contains age. So the code below should be useful.