Hey can someone help with the following?
I’m trying to scrape a site that has the following information.. I need to pull just the number after the </strong> tag..
[<li><strong>ISBN-13:</strong> 9780375853401</li>, <li><strong>Pub. Date: </strong> 05/11/2010</li>]
[<li><strong>UPC:</strong> 490355000372</li>, <li><strong>Catalog No:</strong> 15024/25</li>, <li><strong>Label:</strong> CAMERATA</li>]
here’s a piece of the code I’ve been using to grab the above data using mechanize and BeautifulSoup. I’m stuck here as it won’t let me use the find() function for a list
br_results = mechanize.urlopen(br_results)
html = br_results.read()
soup = BeautifulSoup(html)
local_links = soup.findAll("a", {"class" : "down-arrow csa"})
upc_code = soup.findAll("ul", {"class" : "bc-meta3"})
for upc in upc_code:
upc_text = upc.contents.contents
print upc_text
I imagine
upc_codeis the list you’re showing us, and thelocal_linksone has nothing to do with your question right? Given that you don’t mention it further in your code…?So I’m not certain what
upc_textwould be in your loop’s body given thatupcis aulTag—upc.contentsis going to be a list oflitags (presumably), and I don’t see howupc.contents.contentscan work — what are you seeing as a result of that code? I would have expected an exception!Anyway, the way I’d write the loop would be something like:
since you appear to want the second child of each list item (the first one is the
strongtag, the second one the navigable string you want.If it’s not the second child of each list item that you want, please clarify; for example, you could identify the strong and get its next sibling, if that suits you better — just make the body of the nested loop into