I have an HTML file and i would like to parse through it using

Question

0

Asked: June 5, 20262026-06-05T22:15:38+00:00 2026-06-05T22:15:38+00:00

I have an HTML file and i would like to parse through it using

0

I have an HTML file and i would like to parse through it using python 3.2
sample :-

<td class="ln">15</td><td class="sf3b2"><code>&nbsp;</code></td>
<td class="ln">15</td><td class="sf3b2"><code>&nbsp;</code></td>

The job is to detect the numbers which are not tagged (in this case 15 only) and store them in another text file. I aint being able to decide which html parser to use (lxml,beautiful soup) as I am new to this. Could you please guide me about how to approach this problem.Thanks in advance!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T22:15:39+00:00

You could try something like this.

from BeautifulSoup import BeautifulSoup

def getPrintUnicode(soup):

    body=''
    if isinstance(soup, unicode):
        soup = soup.replace('&#39;',"'")
        soup = soup.replace('&quot;','"')
        soup = soup.replace('&nbsp;',' ')
        soup = soup.replace('&gt;','>')
        soup = soup.replace('&lt;','<')
        body = body + soup
    else:
        if not soup.contents:
            return ''
        con_list = soup.contents
        for con in con_list:
            body = body + getPrintUnicode(con)
    return body

print getPrintUnicode(BeautifulSoup('<td class="ln">15</td><td class="sf3b2"><code>&nbsp;</code></td>'))

You can use this getPrintUnicode() function on the soup of whole page. It will return the complete content. Use exceptions and convert string to integers.
eg.

print int(getPrintUnicode(BeautifulSoup('<td class="ln">15</td><td class="sf3b2"><code>&nbsp;</code></td>')))

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have an HTML file and i would like to parse through it using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply