I’m am writing a bit of python code that gets data from a website. The table is well formed and everything is working fine most of the time.
However, when the parser encounters a blank field, it totally ignores it. I need it to count the blank space but I can’t figure out how to do this.
The problem lies with some arrays I am using that are giving me out of bounds errors.
Anyway, here’s my code:
class MyParser(HTMLParser):
def __init__(self, *args, **kwargs):
#There are only 2 tables in the source code. Outer one is useless to me
self.outerloop = True
#Set to true when we are in the table, and we want to collect data
self.capture_data = False
#Array to store the captured data
self.dataArray = []
HTMLParser.__init__(self, *args, **kwargs)
def handle_starttag(self, tag, attrs):
if tag == 'table' and self.outerloop:
self.outerloop=False
elif tag=='td' and not self.outerloop:
self.capture_data=True
elif tag=='th':
self.capture_data=False
def handle_endtag(self, tag):
if tag == 'table':
self.capture_data=False
def handle_data(self, data):
if self.capture_data:
self.dataArray.append(data)
#Function to call the parser
def getData(self):
self.p = MyParser()
url = 'http://www.mysite.com/get.php'
content = urllib.urlopen(url).read()
self.p.feed(content)
val=0
resultString=""
while val < len(self.p.dataArray):
resultString+=self.p.dataArray[val]+","
val+=1
return HttpResponse(resultString[:-1])
The problem lies in the handle_data function. Somehow in there I need to tell it to store <td></td> as , eg a blank string. This is important since I output the string to my webpage as a comma seperated list of values, as can be seen at the bottom.
I’d be very grateful for anyone who can help me with this.
Thanks.
OK, I know its frowned upon for answering your own question, but in case anyone comes across this problem in the future, I’ll just put up my source.
I fixed it by having 2 integers. They both start off at 0. When I encountered the opening tag in questin, I would increment ONE of the numbers. When the data was handled, I then incremented the 2nd number. When I encountered the closing tag for this particular tag, I checked to see if the numbers were equal, which they should be if the data was consumed.
If it turned out the numbers were not equal, then it would mean that the program did not handle the data, ie a blank tag. I then simply appended
N/Ato the array and got it working.See here: