I need to extract the text (only plain text) from an arbitrary web page (I do bypass the cross-domain problem with a simple php proxy on my server).
I do, as usual,
$.get(url, function(data) {
process(data);
});
and, in my process() function I have the content of the page.
I want to consider a particular div (here ‘#my-div’) in that page, or, if not present – as a fallback – the whole body.
I would like to do something like this:
function process(content) {
if ($(content).find('#my-div'))
$('#output').text($(content).find('#my-div').text());
else
$('#output').text($(content).find('body').text());
}
But I always bet get an empty result when “finding” ‘body’: any suggestion?
Some issues…
Fixed…
Now theoretically it would seem to work, but there are issues with passing an entire HTML document to the
$function. You’ll find that some browsers strip out some of the elements, like<head>and<body>.You’ll ultimately need to test for each of these situations, something like this…