I am trying to use PhantomJS to load a page (that uses Javascript to load items on the webpage) and returns all the HTML on the page (at least within the <body /> tags) to the PHP function that executes phantomjs httpget.js.
Problem: I can get phantomjs to return the document.title, but asking it to console.log(document.body) simple gives me a [object Object]. How can I extract the page’s HTML?
It also takes much longer to load the webpage using phantomjs compared to the browser.
httpget.js
console.log('hello!');
var page = require('webpage').create();
page.open("http://www.asos.com/Men/T-Shirts-Vests/Cat/pgecategory.aspx?cid=7616#parentID=-1&pge=0&pgeSize=900&sort=1",
function(status){
console.log('Page title is ' + page.evaluate(function () {
return document.body;
}));
phantom.exit();
});
Output (running from shell)
hello!
Page title is [object Object]
document.body.innerHTMLcontains the HTML of the body.