I have a Python script that downloads a GRIB file (weather forecast data) from the NOAA website based on a date, time, and hours to forecast ahead. Basically the Python pieces together a big URL request and posts it over to the NOAA website. This works great on the computers at school and it worked great for some previous stack-overflowers that assisted me with the code. However, the same exact script fails 9 out of 10 times using Python on my computer, even though when I make Python print out the URL and I copy it into Firefox, it works fine every time. Changing the library to urllib2 doesn’t change anything.
So I can say the following: somehow urllib is not able to get the data I want if I am using my computer but the script works fine everywhere else. Urllib can scrape HTML off of other websites on my computer with no problem but somehow this particular download is giving it trouble.
I am running Ubuntu precise and using Python 2.7.3 on a laptop with a wireless connection when I try to run the script at home. I have tested it on an a wired computer with ubuntu precise and it works every time (also tested on fedora, also works there).
Please tell me some diagnostics I can do to figure out why urllib and my computer aren’t playing nice. And thank you; this problem is standing in the way of the next generation of high altitude balloon launches.
Heres what it tells me 90% of the time:
Traceback (most recent call last):
File "/home/dantayaga/bovine_aerospace/dev/grib_get.py", line 67, in <module>
webf=urllib.urlopen(griburl, data='POST')
File "/usr/lib/python2.7/urllib.py", line 88, in urlopen
return opener.open(url, data)
File "/usr/lib/python2.7/urllib.py", line 209, in open
return getattr(self, name)(url, data)
File "/usr/lib/python2.7/urllib.py", line 344, in open_http
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 757, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno -2] Name or service not known
Here is the code I am using (credit to samy.vilar et al for improved pythonicity). Note that you have to input today’s date and a forecast time of 00, 06, 12 or 18 (GMT) otherwise you may get a 404 not found. Keep forecast hours the same.
Get GRIB files
import urllib
#import os
#os.environ['http_proxy']='' #Doesn't seem to help!
forecast_time='06' #What time the forecast is (00, 06, 12, 18)
forecast_hours='12' #How many hours ahead to forecast (2 or 3 digits)
forecast_date='20120720' #What date the forecast is for yyyymmdd
top_lat=90 #Top of bounding box (North)
bottom_lat=-90 #Bottom of bounding box (South)
left_lon=-90 #Left of bounding box (West)
right_lon=90 #Right of bounding box (East)
griburl='http://nomads.ncep.noaa.gov/cgi-bin/filter_gfs_hd.pl?'
griburl=griburl+'file=gfs.t'+str(forecast_time)+'z.mastergrb2f'
griburl=griburl+forecast_hours
#Select atmospheric levels
griburl=griburl+'&lev_1000_mb=on' #1000 mb level
griburl=griburl+'&lev_975_mb=on' #975 mb level
griburl=griburl+'&lev_950_mb=on' #950 mb level
griburl=griburl+'&lev_925_mb=on' #925 mb level
griburl=griburl+'&lev_900_mb=on' #900 mb level
griburl=griburl+'&lev_850_mb=on' #850 mb level
griburl=griburl+'&lev_800_mb=on' #800 mb level
griburl=griburl+'&lev_750_mb=on' #750 mb level
griburl=griburl+'&lev_700_mb=on' #700 mb level
griburl=griburl+'&lev_650_mb=on' #650 mb level
griburl=griburl+'&lev_600_mb=on' #600 mb level
griburl=griburl+'&lev_550_mb=on' #550 mb level
griburl=griburl+'&lev_500_mb=on' #500 mb level
griburl=griburl+'&lev_450_mb=on' #450 mb level
griburl=griburl+'&lev_400_mb=on' #400 mb level
griburl=griburl+'&lev_350_mb=on' #350 mb level
griburl=griburl+'&lev_300_mb=on' #300 mb level
griburl=griburl+'&lev_250_mb=on' #250 mb level
griburl=griburl+'&lev_200_mb=on' #200 mb level
griburl=griburl+'&lev_150_mb=on' #150 mb level
griburl=griburl+'&lev_100_mb=on' #100 mb level
griburl=griburl+'&lev_70_mb=on' #70 mb level
griburl=griburl+'&lev_30_mb=on' #30 mb level
griburl=griburl+'&lev_20_mb=on' #20 mb level
griburl=griburl+'&lev_10_mb=on' #10 mb level
#Select variables
griburl=griburl+'&var_HGT=on' #Height (geopotential m)
griburl=griburl+'&var_RH=on' #Relative humidity (%)
griburl=griburl+'&var_TMP=on' #Temperature (K)
griburl=griburl+'&var_UGRD=on' #East-West component of wind (m/s)
griburl=griburl+'&var_VGRD=on' #North-South component of wind (m/s)
griburl=griburl+'&var_VVEL=on' #Vertical Windspeed (Pa/s)
#Select bounding box
griburl=griburl+'leftlon='+str(left_lon)
griburl=griburl+'rightlon='+str(right_lon)
griburl=griburl+'toplat='+str(top_lat)
griburl=griburl+'bottomlat='+str(bottom_lat)
#Select date and time
griburl=griburl+'&dir=%2Fgfs.'+forecast_date+forecast_time+'%2Fmaster'
print(griburl)
print('Downloading GRIB file for date '+forecast_date+' time ' +forecast_time + ', forecasting '+forecast_hours+' hours ahead...')
webf=urllib.urlopen(griburl, data='POST')
print("Download complete. Saving...")
local_filename=forecast_date+'_'+forecast_time+'_'+forecast_hours+'.grib'
localf=open(local_filename, 'wb')
localf.write(webf.read())
print('Requested grib data written to file '+local_filename)
This exception indicates that your laptop is not able to resolve the host name into an IP address. The DNS lookup is handled by the socket library, and this will be independent of whether you use
urlliborurllib2(or anything else for that matter).You need to look at your network set up, in particular your DNS server. It could be that Firefox is configured to use a proxy, in which case it is delegating the DNS lookup to the proxy.
It’s odd that you don’t have problems with other sites; I can’t explain why HTML scraping using
urllibworks for other sites (perhaps proxy is enabled for these scripts?), but the exception that you’re experiencing is definitely related to DNS.If you do find that Firefox is using a proxy, try setting your script up to use the same proxy. A simple way is to invoke your Python script like this:
Alternatively, for diagnostic purposes, you could temporarily hard code the ip address of the remote server into your URLs, i.e.