With jQuery selectors you can select a div that contains the innerText “John” with $("div:contains('John')"), so you could match the second <div> in:
<div>Bill</div>
<div>John</div>
<div>Joe</div>
How can I do this in Python’s Beautiful Soup, or some other Python Module?
I just watched a lecture on scraping form PyCon 2010 where he mentions you can use CSS selectors in lxml.. Do I have to use that, or is there a way just with the Soup?
Background: Asking for the purpose of parsing a scraped web page.
A more concise way using
BeautifulSoup:soup()is equivalent tosoup.findAll(). You could use string, regular expression, arbitrary function to select what you need.stdlib’s
ElementTreeis enough in your case: