I have a webpage that I read using Python and BeautifulSoup, say soup=BeautifulSoup(urllib2.urlopen(site)) .

Question

0

Asked: May 30, 20262026-05-30T07:51:54+00:00 2026-05-30T07:51:54+00:00

I have a webpage that I read using Python and BeautifulSoup, say soup=BeautifulSoup(urllib2.urlopen(site)) .

0

I have a webpage that I read using Python and BeautifulSoup, say soup=BeautifulSoup(urllib2.urlopen(site)).

I’m trying to grab a snippet of the site and parse it, so I use a pTag = soup.find("p", {"class":"secondary"}), which results in the following content.

<p class="secondary">
              Some address and street
              <br />
              City, State, ZIP
              (some) phone-number
             </p>

I would like to basically have variables address1, address2, and phone such that:

address1= "Some address and street"
address2= "City, State, ZIP"
phone= "(some) phone-number"

I’m not sure how to read the rows of a soup to selectively pick rows 1, 3, 4 (assuming starting row 0), but then again I’m also open to other ways of getting the data I want.

Thanks in advance! 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T07:51:55+00:00

Assuming address contains your raw address.

<p class="secondary">
              Some address and street
              <br />
              City, State, ZIP
              (some) phone-number
             </p>

Then you can replace the break line with a comma, before finally splitting by comma. This is not ideal but for these scenarios when there is no clear separation between elements (spans, id’s etc…) then it all comes down to positional checking.

address.find("br").replaceWith(",")
addressComponents = address.text.split(",")

That gives you the following four components in the addressComponents list.

Some address and street
City
 State
 ZIP
              (some) phone-number

As there is no break line for the ZIP and phone number there appears to be a newline character inserted. So to split the final component:

addressSplit = addressComponents[3].split("\n")
print addressSplit[0] # Zip code
print addressSplit[1].strip() # Phone number

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a webpage that I read using Python and BeautifulSoup, say soup=BeautifulSoup(urllib2.urlopen(site)) .

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply