There is a website I am trying to pull information from in Perl, however the section of the page I need is being generated using javascript so all you see in the source is:
<div id="results"></div>
I need to somehow pull out the contents of that div and save it to a file using Perl/proxies/whatever. e.g. the information I want to save would be
document.getElementById('results').innerHTML;
I am not sure if this is possible or if anyone had any ideas or a way to do this.
I was using a lynx source dump for other pages but since I cant straight forward screen scrape this page I came here to ask about it!
If anyone is interested, the page is http://downloadcenter.trendmicro.com/index.php?clk=left_nav&clkval=pattern_file®s=NABU and the info I am trying to get is the row about the ConsumerOPR
You’ll need to reverse-engineer what the Javascript is doing. Does it fire off an AJAX request to populate the
<div>? If so, it should be pretty easy to sniff the request using Firebug and then duplicate it with LWP::UserAgent or WWW::Mechanize to get the information.If the Javascript is just doing pure DOM manipulation, then that means the data must exist somewhere else in the page or the Javascript already. So figure out where it’s coming from and grab it.
Finally, if none of those options are adequate, you may need to just use a real browser to do it. There are a few options for automating browser behavior, like WWW::Mechanize::Firefox or Win32::IE::Mechanize.