I’m trying to create a script which makes requests to random urls from a txt file e.g.:
import urllib2
with open('urls.txt') as urls:
for url in urls:
try:
r = urllib2.urlopen(url)
except urllib2.URLError as e:
r = e
if r.code in (200, 401):
print '[{}]: '.format(url), "Up!"
But I want that when some url indicates 404 not found, the line containing the URL is erased from the file. There is one unique URL per line, so basically the goal is to erase every URL (and its corresponding line) that returns 404 not found.
How can I accomplish this?
The easiest way is to read all the lines, loop over the saved lines and try to open them, and then when you are done, if any URLs failed you rewrite the file.
The way to rewrite the file is to write a new file, and then when the new file is successfully written and closed, then you use
os.rename()to change the name of the new file to the name of the old file, overwriting the old file. This is the safe way to do it; you never overwrite the good file until you know you have the new file correctly written.I think the simplest way to do this is just to create a list where you collect the good URLs, plus have a count of failed URLs. If the count is not zero, you need to rewrite the text file. Or, you can collect the bad URLs in another list. I did that in this example code. (I haven’t tested this code but I think it should work.)
EDIT: In comments, @abarnert suggests that there might be a problem with using
os.rename()on Windows (at least I think that is what he/she means). Ifos.rename()doesn’t work, you should be able to useshutil.move()instead.EDIT: Rewrite code to handle errors.
EDIT: Rewrite to add verbose messages as URLs are tracked. This should help with debugging. Also, I actually tested this version and it works for me.