I had a webscraping technique all worked out in PHP until I found out that the platform I was developing it on (iOS via phone gap) didn’t support PHP. I found a solution though, via JS.
$(document).ready(function(){
var container = $('#target');
$('.ajaxtrigger').click(function(){
doAjax($(this).attr('href'));
return false;
});
function doAjax(url){
if(url.match('^http')){
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
"q=select%20*%20from%20html%20where%20url%3D%22"+
encodeURIComponent(url)+
"%22&format=xml'&callback=?",
function(data){
if(data.results[0]){
var data = filterData(data.results[0]);
container.html(data);
} else {
var errormsg = '<p>Error: could not load the page.</p>';
container.html(errormsg);
}
}
);
} else {
$('#target').load(url);
}
}
function filterData(data){
data = data.replace(/<?\/body[^>]*>/g,'');
data = data.replace(/[\r|\n]+/g,'');
data = data.replace(/<--[\S\s]*?-->/g,'');
data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
data = data.replace(/<script.*\/>/,'');
return data;
}
});
The way the URL is loaded is by clicking a link and it get’s the href of it and populates url (at least I think). I want the url to be static though, such as http://website.com and being pre populated. I tried replacing all doAjax(url) with doAjax('http://website.com) but that doesn’t seem to work and I would like to know what I’m doing wrong.
Another problem with the new JS script is that I want to be able to parse the results and only show the table element. In PHP, I did this using:
data = $html->find('table');
echo $data[1];
What would the equivalent be in javascript of that PHP function?
On a side note, I was considering splitting this up into two posts, but I thought that would be too many posts for tonight 🙂
Edit: First problem was solved by @nnnnnn in the comments.
If you’re saying the above code works when the links are clicked (via the click handler calling your
doAjax()function), but you also want to automatically call it for'http://website.com'without having to click, just adddoAjax('http://website.com');to the end of your document.ready function.As far as extracting just a particular table from the response, within your ajax callback function you can create a jQuery object from the returned data, and then use jQuery’s
.find()method to extract the part you care about, and.append()to add that part to your container element:Note that the selector for
.find()may need more information to select specifically the table you are talking about. Not sure if"table:first"would do it, or if that table has an id attribute you could use instead:.find("#thetableidhere"), or…?