EDIT: I have provided the EXACT source code I’m using to try to figure

Question

0

Asked: June 15, 20262026-06-15T07:19:50+00:00 2026-06-15T07:19:50+00:00

EDIT: I have provided the EXACT source code I’m using to try to figure

0

EDIT: I have provided the EXACT source code I’m using to try to figure out this issue.

I’m trying to extract the data on “total assets” from Yahoo Finance using Python 2.7 and lxml. An example of a page I’m trying to extract this information from is http://finance.yahoo.com/q/bs?s=FAST+Balance+Sheet&annual .

I’ve already successfully extracted the data on “total assets” from Smartmoney. An example of a Smartmoney page I’m able to parse is http://www.smartmoney.com/quote/FAST/?story=financials&timewindow=1&opt=YB&isFinprint=1&framework.view=smi_emptyView .

Here is a special test script I set up to work on this issue:

    import urllib
    import lxml
    import lxml.html 

    url_local1 = "http://www.smartmoney.com/quote/FAST/?story=financials&timewindow=1&opt=YB&isFinprint=1&framework.view=smi_emptyView" 
    result1 = urllib.urlopen(url_local1)
    element_html1 = result1.read()
    doc1 = lxml.html.document_fromstring (element_html1)
    list_row1 = doc1.xpath(u'.//th[div[text()="Total Assets"]]/following-sibling::td/text()')
    print list_row1

    url_local2 = "http://finance.yahoo.com/q/bs?s=FAST" 
    result2 = urllib.urlopen(url_local2)
    element_html2 = result2.read()
    doc2 = lxml.html.document_fromstring (element_html2)
    list_row2 = doc2.xpath(u'.//td[strong[text()="Total Assets"]]/following-sibling::td/strong/text()')
    print list_row2

I’m able to get the row of data on total assets from the Smartmoney page, but I get just an empty list when I try to parse the Yahoo Finance page.

The source code of the table row on the Smartmoney page is:

    <tr class="odd bold">
<th><div style='font-weight:bold'>Total Assets</div></th>
<td>  1,684,948</td>
<td>  1,468,283</td>                                
<td>  1,327,358</td>                                
<td>  1,304,149</td>                                    
<td>  1,163,061</td>
    </tr>

The source code of the table row on the Yahoo page is:

    <tr>
<td colspan="2"><strong>Total Assets</strong></td>
<td align="right"><strong>1,684,948&nbsp;&nbsp;</strong></td>
<td align="right"><strong>1,468,283&nbsp;&nbsp;</strong></td>
<td align="right"><strong>1,327,358&nbsp;&nbsp;</strong></td>
    </tr>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T07:19:52+00:00

Contains syntax errors, should be td/strong/text() at the end, plus you have a trailing ]. I’d say that the correct query would be:

xpath('//td[strong[text()="Total Assets"]]/following-sibling::td/strong/text()')

Result:

>>> tree.xpath('//td[strong[text()="Total Assets"]]/following-sibling::td/strong/text()')
[u'1,684,948\xa0\xa0', u'1,468,283\xa0\xa0', u'1,327,358\xa0\xa0']

In the original page the “Total Assets” <strong> container has whitespace and linebreaks. Use the additional normalize-space function on the text() result like so:

xpath('//td[strong[normalize-space(text())="Total Assets"]]/following-sibling::td/strong/text()')

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

EDIT: I have provided the EXACT source code I’m using to try to figure

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply