So my brother wanted me to write a web crawler in Python (self-taught) and

Question

0

Asked: May 16, 20262026-05-16T15:16:35+00:00 2026-05-16T15:16:35+00:00

So my brother wanted me to write a web crawler in Python (self-taught) and

0

So my brother wanted me to write a web crawler in Python (self-taught) and I know C++, Java, and a bit of html. I’m using version 2.7 and reading the python library, but I have a few problems
1. httplib.HTTPConnection and request concept to me is new and I don’t understand if it downloads an html script like cookie or an instance. If you do both of those, do you get the source for a website page? And what are some words that I would need to know to modify the page and return the modified page.

Just for background, I need to download a page and replace any img with ones I have

And it would be nice if you guys could tell me your opinion of 2.7 and 3.1

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T15:16:36+00:00

~~Use Python 2.7, is has more 3rd party libs at the moment.~~ (Edit: see below).

I recommend you using the stdlib module urllib2, it will allow you to comfortably get web resources.
Example:

import urllib2

response = urllib2.urlopen("http://google.de")
page_source = response.read()

For parsing the code, have a look at BeautifulSoup.

BTW: what exactly do you want to do:

Just for background, I need to download a page and replace any img with ones I have

Edit: It’s 2014 now, most of the important libraries have been ported, and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So my brother wanted me to write a web crawler in Python (self-taught) and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply