Basically, I want to extract the strings “AAA”, “BBB”, “CCC”, “DDD” from a text file…
...... (other text goes here).....
<TD align="left" class=texttd><font class='textfont'>AAA</font></TD>
..... (useless text here).....
<TD align="left" class=texttd><font class='textfont'>BBB</font></TD>
....(more text).....
<TD align="left" class=texttd><font class='textfont'>CCC</font></TD>
<TD align="left" class=texttd><font class='textfont'>DDD</font></TD>
......(more text).....
I want something like if I do:-
data = foo(“file.txt”)
I get:-
data = [‘AAA’,’BBB’,’CCC’,’DDD’]
What is the best possible way? My file is not big…
Basically, I want to extract “remaining upload data transfer” from this file which in HTML looks like THIS
You could write a REGEX but it would be “parsing” the HTML to some extent. The problem with writing regular expressions for HTML is HTML is a mess. It’s rarely perfect and this causes problems when you rely on it for data.
I would personally use BeautifulSoup. It does do more than you’re asking but also at superfraction of the effort.