I have an HTML document, and I want to parse out a table with a specific id, which is always within a div tag with a specific id. Here is what I’ve tried:
soup = BeautifulSoup(html)
target_div = soup('div', {'id' : 'left'})
target_table = target_div.findNextSibling('table')
Clearly that’s not working. It seems that my second statement returns a ResultSet instead of moving me around the document (which I suppose makes sense, but I’m not sure how to get what I need otherwise!). What is the correct methodology for doing this kind of parsing?
findNextSiblinglooks for tables that are contained in the same parent as the originaltarget_divelement. You want to look for a table contained in the div. Use.find()for that:and for simple cases (such as the contained table) you can use the tagname as an attribute:
You were calling a tag, which is like using the
.find_all()method..find_all()returns all matching tags, a list. You’d have to loop over the result set, but since you are looking for a single div (using its id) you are better off using.find()which returns just one result.If you do need to process more than one match, just treat the result of
.find_all()as a list; loop over it:or use indices: