Trying to get html code of http://groupon.cl/descuentos/santiago-centro with the following python code:
import urllib.request
url="http://groupon.cl/descuentos/santiago-centro"
request = urllib.request.Request(url, headers = {'user-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'})
response = urllib.request.urlopen(request)
return response.read().decode('utf-8')
I’m getting html code for a page which asks for my location. If I manually open the same link with my browser (having no cookies involved, even with a recently installed browser) I go directly to a page with discount promotions. It seems to be some redirect action that is not taken place for urllib. I am using the user-agent header to try to get the behaviour for a typical browser, but I have no luck.
How could I get the same html code as with my browser?
I think you can run this command:
and you will see the wget print two http request and save the response page to a file.
and the content of the file was html code of you want.
The first response code is 302, so
urllib.requst.urlopendo a second request. But it dit notset the correct cookie which get from the first response, the server cannot undstand the
second request, so you get another page.
The http.client module does not handle the 301 or 302 http reponse by himself.