I have some html-page. There is a javascript which generates some content. I have to parse this content from python-script. I have saved copy of file on the computer. Are there any ways to work with ‘already generated‘ html? Like I can see in the browser after opening page-file. As I understand, I have to work with DOM (maybe, xml2dom lib).
I have some html-page. There is a javascript which generates some content. I have
Share
Have you saved “the file” (web page, I imagine) before or after Javascript has altered it?
If “after”, then it doesn’t matter any more that some of the HTML was done via Javascript — you can just use popular parsers like lxml or BeautifulSoup to handle the HTML you have.
If “before”, then first you need to let Javascript do its work by automating a real browser; for that task, I would recommend SeleniumRC — which brings you back to the “after” case;-).