Edit: Just for clarification I am using python, and would like to do this within python.
I am in the middle of collecting data for a research project at our university. Basically I need to scrape a lot of information from a website that monitors the European Parliament. Here is an example of how the url of one site looks like:
The numbers after the reference part of the address refers to:
A7 = Parliament in session (previous parliaments are A6 etc.),
2010 = year,
0190 = number of the file.
What I want to do is to create a variable that has all the urls for different parliaments, so I can loop over this variable and scrape the information from the websites.
P.S: I have tried this:
number = range(1,190,1)
for i in number:
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A7-2010-" + str(number[i]) +"&language=EN"
results = search_url
print results
but this gives me the following error:
Traceback (most recent call last):
File “”, line 7, in
IndexError: list index out of range
If I understand correctly, you just want to be able to loop over the parliments?
i.e. you want A7, A6, A5…?
If that’s what you want a simple loop could handle it:
for the other values similar loops would work just as well:
You could easily nest your loops in the proper order to generate the combination(s) you need. HTH!
Edit:
String formatting is super useful, and here’s how you can do it with your example:
String formatting takes a string with a variety of specifiers – you’ll recognize them because they have % in them – followed by % and a tuple containing the arguments to the format string.
If you want to add the year and parliment, change the string to this:
search_url = "http://www.europarl.europa.eu/sides/getDoc.do?type=REPORT&mode=XML&reference=A%d-%d-%.4d&language=EN"where the important changes are here:
reference=A
%d–%d–%.4d&language=ENThat means you’ll need to pass 3 decimals like so:
print search_url % (parliment, year, number)