I’m not sure if this is possible but I would like to retrieve some data from a web page that uses Javascript to render data. This would be from a linux shell.
What I am able to do now:
-
http post using curl/lynx/wget to login and get headers from command line
-
use headers to get into ‘secure’ locations in the webpage on command line
However, the only elements that are rendered on the page are the static html. Most of the info I need are rendered dynamically with js (albeit eventually as a html as well) and don’t show up on a command line browser. I understand the issue is with the lack of a js interpreter.
As such… some workarounds I thought might be possible are:
-
calling full browsers from command line and somehow passing the info back to stdout. this would mean that I have to be able to POST.
-
passing the headers (with session info, etc…) i got from curl to one of these full browsers and again dumping the output html back to stdout. it could very be a printscreen function on the window if all else fails.
-
a pure java solution would be OK too.
Anyone has any experience doing something similar and succeeding?
Thanks!
You can use WebDriver to do, just that you need have web browser installed. There are other solution as well such as Selenium and HtmlUnit (without browser but might behave differently).
You can find example of Selenium project at here.
WebDriver
Selenium
HtmlUnit
I would recommend use
WebDriverbecause it is not required standalone server likeSelenium, while forHtmlUnitmight suitable if you dont want install browser without worry about Xvfb in headless environment.