I would like to use Mechanize (with Python) to submit a form, but unfortunately the page has been badly coded and the <select> element is not actually inside <form> tags.
So I can’t use the traditional method via the form:
forms = [f for f in br.forms()]
mycontrol = forms[1].controls[0]
What can I do instead?
Here is the page I would like to scrape, and relevant bit of code – I’m interested in the la select item:
<fieldset class="searchField">
<label>By region / local authority</label>
<p id="regp">
<label>Region</label>
<select id="region" name="region"><option></option></select>
</p>
<p id="lap">
<label>Local authority</label>
<select id="la" name="la"><option></option></select>
</p>
<input id="byarea" type="submit" value="Go" />
<img id="regmap" src="/schools/performance/img/map_england.png" alt="Map of regions in England" border="0" usemap="#England" />
</fieldset>
This is actually more complex that you think, but still easy to implement. What is happening is that the webpage you linking is pulling in the local authorities by JSON (which is why the
name="la"select element doesn’t fill in Mechanize, which lacks Javascript). The easiest way around is to directly ask for this JSON data with Python and use the results to go directly to each data page.As you can see, you don’t even need to load the page you linked or use Mechanize to do it! However, you will still need a way to parse out the school names and then then performance figures.