I’m am writing a bit of python code that gets data from a website.

Question

0

Asked: May 26, 20262026-05-26T09:51:16+00:00 2026-05-26T09:51:16+00:00

I’m am writing a bit of python code that gets data from a website.

0

I’m am writing a bit of python code that gets data from a website. The table is well formed and everything is working fine most of the time.

However, when the parser encounters a blank field, it totally ignores it. I need it to count the blank space but I can’t figure out how to do this.

The problem lies with some arrays I am using that are giving me out of bounds errors.

Anyway, here’s my code:

class MyParser(HTMLParser):
    def __init__(self, *args, **kwargs):
        #There are only 2 tables in the source code. Outer one is useless to me
        self.outerloop = True
        #Set to true when we are in the table, and we want to collect data
        self.capture_data = False
        #Array to store the captured data
        self.dataArray = []
        HTMLParser.__init__(self, *args, **kwargs)

    def handle_starttag(self, tag, attrs):
        if tag == 'table' and self.outerloop:
            self.outerloop=False
        elif tag=='td' and not self.outerloop:
            self.capture_data=True
        elif tag=='th':
            self.capture_data=False

    def handle_endtag(self, tag):
        if tag == 'table':
            self.capture_data=False

    def handle_data(self, data):
        if self.capture_data:
            self.dataArray.append(data)

#Function to call the parser
def getData(self):
    self.p = MyParser()

    url = 'http://www.mysite.com/get.php'
    content = urllib.urlopen(url).read()
    self.p.feed(content)

    val=0
    resultString=""

    while val < len(self.p.dataArray):
        resultString+=self.p.dataArray[val]+","
        val+=1

    return HttpResponse(resultString[:-1])

The problem lies in the handle_data function. Somehow in there I need to tell it to store <td></td> as , eg a blank string. This is important since I output the string to my webpage as a comma seperated list of values, as can be seen at the bottom.

I’d be very grateful for anyone who can help me with this.

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T09:51:17+00:00

OK, I know its frowned upon for answering your own question, but in case anyone comes across this problem in the future, I’ll just put up my source.

I fixed it by having 2 integers. They both start off at 0. When I encountered the opening tag in questin, I would increment ONE of the numbers. When the data was handled, I then incremented the 2nd number. When I encountered the closing tag for this particular tag, I checked to see if the numbers were equal, which they should be if the data was consumed.

If it turned out the numbers were not equal, then it would mean that the program did not handle the data, ie a blank tag. I then simply appended N/A to the array and got it working.
See here:

class MyHTMLParser(HTMLParser):
    def __init__(self, *args, **kwargs):
        self.outerloop = True
        self.capture_data = False
        self.dataArray = []
        self.celldata="NA"
        self.firstnum=0
        self.secondnum=0
        HTMLParser.__init__(self, *args, **kwargs)

    def handle_starttag(self, tag, attrs):
        if tag == 'table' and self.outerloop:
            self.outerloop=False
        elif tag=='td' and not self.outerloop:
            self.capture_data=True # bool to indicate we want to capture data
            self.firstnum+=1    # increment first num to say we have encountered the tag in question
        elif tag=='th':
            self.capture_data=False

    def handle_endtag(self, tag):
        if tag == 'table':
            self.capture_data=False
        elif tag == 'td' and not self.firstnum == self.secondnum:   #check if they are not equal
            self.dataArray.append(self.celldata)    # append filler data
            self.secondnum=self.firstnum    # make them equal for next tag

    def handle_data(self, data):
        if self.capture_data::
            self.dataArray.append(data)
            self.secondnum=self.firstnum

def getTides(self):
    self.p = MyHTMLParser()

    url = 'http://www.mysite.com/page.php'
    content = urllib.urlopen(url).read()
    self.p.feed(content)

    val=0
    resultString=""

    while val < len(self.p.dataArray):
        resultString+=self.p.dataArray[val]+","
        val+=1

    return HttpResponse(resultString[:-1])

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m am writing a bit of python code that gets data from a website.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply