I am trying to download a PDF file from a website, I know the name of the file, e.g. foo.pdf, but It’s location changes every few weeks:
e.g.
http://www.server.com/media/123456/foo.pdf
changes into
http://www.server.com/media/245415/foo.pdf
The number is always a six-figure number, so I tried using a bash script to go through all 10 million of them, but that obviously takes a lot of time:
i=0
until [ "$RC" == "0" ] || [ $i == 1000000 ]
do
b=$(printf %06d $i)
wget -q http://www.server.com/media/${b}/foo.pdf -O bar.pdf
export RC=$?
i=$(($i + 1))
done
For wrong addresses I just get 404 errors.
I tested it around the currently correct address and it works.
Does anyone know a faster way to solve this problem?
If that page is linked form anywhere else, then you can get the link from there, and just get the file. If it’s not, you are probably out of luck.
Note that most servers would consider trying to hit the webserver 1,000,000 times abuse, and would ban your IP for even trying.