I’m trying to parse the url ‘http://www.5min.com/handlers/SitemapHandler.ashx?type=videositemap&page=1’ in python 2.7. The problem is when i open the url in urlopen, it doesn’t display the source, it displays weird characters. It might be encoded.
I’m trying to parse the url ‘http://www.5min.com/handlers/SitemapHandler.ashx?type=videositemap&page=1’ in python 2.7. The problem is when
Share
You are parsing the response of webserver not a .ashx file. Open that url in your browser. That is what python will see when you open it with urlopen.
From opening that these are the headers I got with the response:
In fact it looks like the response is going to be in xml format. So you will need to parse the xml with ElementTree (or something else of your preference). Also note that the server is sending the response encoded as gzip (ZipFile), it may or may not do that depending on if urlopen allows that or not. If you’re seeing gibberish with Urlopen try using python’s ZipFile to decompress the response