How do I download through R the source code for a webpage that has tags that were created by JavaScript?
When I use the FireFox ‘Inspect Element’ function, tags are sometimes not shown in the HTML source file. In other words, information I need is in the JavaScript code. Is there a way to read this information into R?
Related question: How to view webpage source code using R?
You can use
getURLfrom RCurl to get the HTTP response.Now you can spit the string on the opening tag, then split that on the closing tag
Which gives:
It turns out that there is more than one of the div tag you wanted, and the above gets the wrong one. I don’t know how to do it purely in R, but I followed the post you referenced by VitoshKa and I got it to work.
First, in Firefox go to Tools -> Add-ons. Search for and install mozrepl. Then, in Firefox click Tools -> MozRepl -> Start.
Now, in R:
Now,
outis a vectorlocholds the positions of the strings that contain your tag. It appears twice. The first one is the one you’re interested in.You can extract the information from this the same way I showed above with
strsplit, or with a regular expression andgsubYou can close the window that opens with
writeLines("w.window.close()", mz)