Considering this URL:
http://www.nyse.com/about/listed/chn.html
I am trying to retrieve this string : ‘Pacific Ex Japan Funds’ however it is not in the soup !?!
fundCode = 'chn'
url = 'http://www.nyse.com/about/listed/' + fundCode + '.html'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)
Which is weird as other parts of the table are in the soup.
Any idea?
If you download the HTML (without a browser)
you’ll see the page data is provided by JavaScript functions.
To extract information from this page, you’ll need a library that can process JavaScript.
One way to do that is to use Selenium, another way is to use PyQt’s WebKit.