I have this html data which I need to parse to extract data from it.But it has so many tags and data is also difficult too navigable for me.From below Html data i need to create a python dictionary list that will look like:
[{“School”:”Childs
play”},{“Place”:”newyork”},{“Level”:”four”},{“Country”:”USA”},{“Level
Of Course”:”Easy”}]
<div class="quick">
<strong>School</strong><br /> Childs play <br /><br />
<strong>Place</strong><br />
<a href="Search.aspx?Menu=new&Me=">newyork</a><br /><br />
<strong>Level</strong><br />four<br /><br />
<strong>Country</strong><br />USA<br /><br />
<strong>Level Of Course</strong><br />Easy<br /><br />
</div>
I tried using beautifulsoup but didnt get success .Please help
Unfortunately, the HTML is not ideally constructed for parsing, but it is possible to extract the data into a meaningful Python dictionary.
Using
if not hasattr(x, "name") or not x.name == "br"first checks to make sure that the item is an instance ofNavigableStringand then checks that the element is not a<BR>tag.datawill then be of the format[<KEY>, <VALUE>, <KEY>, <VALUE>]from which it should be fairly trivial to extract the data.