I’m working on a script which downloads some data from Twitter profiles. I found out that HTML structure is different in web browser than in python “robot” because when I open the page through python urllib2 and BeautifulSoup I get different tag IDs and classes. Is there a way to get the same content as in web browser?
I need it for short urls resolving because in web browser, resolved urls are stored in link title attribute.
Most websites adapt their response according to the
User-Agentheader on the request. If none is set, it is obvious that this is not a browser, but some sort of script. You’ll probably want to set aUser-Agentheader that is somewhat similar to a “real” browser.Lots of methods to do this are described here: Changing user agent on urllib2.urlopen and here: Fetch a Wikipedia article with Python
On an unrelated note, you might want to use Requests, which is a much better API than the standard
urllib2.