I have a webpage that I read using Python and BeautifulSoup, say soup=BeautifulSoup(urllib2.urlopen(site)).
I’m trying to grab a snippet of the site and parse it, so I use a pTag = soup.find("p", {"class":"secondary"}), which results in the following content.
<p class="secondary">
Some address and street
<br />
City, State, ZIP
(some) phone-number
</p>
I would like to basically have variables address1, address2, and phone such that:
address1= "Some address and street"
address2= "City, State, ZIP"
phone= "(some) phone-number"
I’m not sure how to read the rows of a soup to selectively pick rows 1, 3, 4 (assuming starting row 0), but then again I’m also open to other ways of getting the data I want.
Thanks in advance! 🙂
Assuming
addresscontains your raw address.Then you can replace the break line with a comma, before finally splitting by comma. This is not ideal but for these scenarios when there is no clear separation between elements (spans, id’s etc…) then it all comes down to positional checking.
That gives you the following four components in the
addressComponentslist.Some address and street City State ZIP (some) phone-numberAs there is no break line for the ZIP and phone number there appears to be a newline character inserted. So to split the final component: