I am trying to read a website using the urllib2 library.
Here is my script:
import urllib2
# Get a file-like object for the Python Web site's home page.
def wikitranslate(word):
translation = ''
pageURL = ''
opener = urllib2.build_opener()
f = opener.open("http://fr.wikipedia.org/w/api.php?action=opensearch&search=" + re.sub(' ', '%20', word.rstrip()))
# Read from the object, storing the page's contents in 's'.
s = f.read()
I am wondering how the server receives these requests and if it can recognize the fact that it is accessed by a python script rather than through a browser.
If so, is there a way to hide it?
The User-Agent field in the header of a url request describes to the web server what browser and system you are using to access the site so the best way to either conceal or identify yourself as a python script request is to change that field.
by default this field is left blank when using
urllib2So if you want to conceal your agent simply do not declare it, or you can forcefully declare it to be something deceitful