I have an HTML file and i would like to parse through it using python 3.2
sample :-
<td class="ln">15</td><td class="sf3b2"><code> </code></td>
<td class="ln">15</td><td class="sf3b2"><code> </code></td>
The job is to detect the numbers which are not tagged (in this case 15 only) and store them in another text file. I aint being able to decide which html parser to use (lxml,beautiful soup) as I am new to this. Could you please guide me about how to approach this problem.Thanks in advance!
You could try something like this.
You can use this getPrintUnicode() function on the soup of whole page. It will return the complete content. Use exceptions and convert string to integers.
eg.