I wrote some code to download some files from URLs, which I put in a dict, however, when I run the script, everything runs fine, except in the end when it goes to download the files, it creates creates one file with one of the names, then I see that file get larger and larger, and then it gets smaller again. This file (mp4) is always unplayable/corrupt, and there’s only ever one, it never moves on to another. Any idea what’s going on? My guess is that somehow python keeps downloading the different files to the one local file and overwriting, I don’t understand why though.
Here’s the code:
import sys
import os
import re
import urllib
import urllib.request
urlfilebytes = urllib.request.urlopen('http://www.pbs.org/wgbh/nova/sciencenow/download/index.html')
urlfile = urlfilebytes.read().decode('utf-8')
urls = re.findall(r'(http://www-tc.pbs.org/wgbh/nova/sciencenow/media/downloads/\S+)"', urlfile)
print(urls)
names = re.findall(r'NSN_\S+.mp4', str(urls))
print(names)
names_to_urls = {}
for name in names:
for url in urls:
names_to_urls[name] = url
print(names_to_urls)
for key in names_to_urls.keys():
for value in names_to_urls.values():
urllib.request.urlretrieve(value, key)
What you want instead of your for loops is just:
You don’t want nested loops: you’re getting all combinations of URL with name instead of just the matching pairs of URL and name.
zip(names, urls)takes the first item of each list, then the second from each list, etc.