i’ve this problem for days…
I have to load from php the entire html of a page.
On this page there is a jquery function that is called when all the page is loaded. This function loads other html into page, so i have to get all the html loaded ( the part loaded with jquery too). I can know that i get all the page trying to find some tag loaded only from jquery. ( for example: tag input with name XXX, tag input with attribute multiple, etc. )
so i try:
$html = file_get_contents("http://wwww.siteToScrape.com");
if (strpos($html, 'multiple') !== false) {
echo 'found';
} else {
echo 'not found';
}
but result is ‘not found’.
Then i downloaded simple html dom and i try:
include 'simple_html_dom.php';
$html = file_get_html("http://wwww.siteToScrape.com");
if (strpos($html, 'multiple') !== false) {
echo 'found';
} else {
echo 'not found';
}
but result still remain ‘not found’.
so i think to get some php script what emulate browser ( so can load jquery too ) and i downloaded PHP Scriptable Web Browser and i try:
require_once('browser.php');
$browser = new SimpleBrowser();
$p = $browser->get('http://wwww.siteToScrape.com');
if (strpos($p, 'multiple') !== false) {
echo 'found';
} else {
echo 'not found';
}
but result is still again ‘not found’.
I don’t know how to do it.. can someone help me??? thanks!!!!
The problem is that you are trying to mix server and client.
PHP runs on the server
Javascript (and therefor also jQuery) runs in the client browser.
There’s no easy way to run the javascript using PHP. As far as I know, it’s not even possible. Other languages, such as Java might be able to do what you are trying to do.
You should look at another way to do this.
This is also the reason why webcrawlers never gets affected by stuff you do using javascript. This is a nice thing to keep in mind when developing. Your dynamic loading will not be indexed by these crawlers at all.