I am writing a web scraper that grabs content from decade articles from wikipedia. (e.g. articles on the 10s, the 1970s, the 1670s BC, and so on.)
I am using logic that resembles this to grab the pages.
for (i = -1690; i <= 2010; i += 10)
if (i < 0)
page = (-i) + "s_BC"
else
page = i + "s"
GrabContentFromURL("http://en.wikipedia.org/wiki/" + page)
This is working, except for one little detail that I hadn’t considered.
The problem is that there are two 0s decades. There is a 0s AD and a 0s BC. With the way my loop currently works, the program only grabs the content from the 0s AD page.
This is a pretty simple problem, but I’m having a hard time coming up with a very nice way to fix it. I know I can extract the body of the loop to a separate function and use two separate loops, but I feel like there’s a more elegant way to do this that I’m missing.
How can I fix this problem without introducing too much complexity?
You mind hitting a few
404pages along the way?If the answer to that question was “yes, I mind” then you can still toss in some
ifs: