I’m trying to parse the HTML of a webpage that requires being logged in.

Question

0

Asked: May 30, 20262026-05-30T06:40:51+00:00 2026-05-30T06:40:51+00:00

I’m trying to parse the HTML of a webpage that requires being logged in.

0

I’m trying to parse the HTML of a webpage that requires being logged in. I can get the HTML of a webpage using this script:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
import re

webpage = urlopen ('https://www.example.com')
soup = BeautifulSoup (webpage)
print soup
#This would print the source of example.com

But trying to get the source of a webpage that I’m logged into proves to be more difficult.
I tried replacing the (‘https://www.example.com’) with (‘https://user:pass@example.com’) but I got an Invalid URL error.

Anyone know how I could do this?
Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T06:40:52+00:00

Selenium WebDriver ( http://seleniumhq.org/projects/webdriver/ ) might be good for your needs here. You can log in to the page and then print the contents of the HTML. Here’s an example:

from selenium import webdriver

# initiate
driver = webdriver.Firefox() # initiate a driver, in this case Firefox
driver.get("http://example.com") # go to the url

# locate the login form
username_field = driver.find_element_by_name(...) # get the username field
password_field = driver.find_element_by_name(...) # get the password field

# log in
username_field.send_keys("username") # enter in your username
password_field.send_keys("password") # enter in your password
password_field.submit() # submit it

# print HTML
html = driver.page_source
print html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse the HTML of a webpage that requires being logged in.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply