I’ve made this to try to extract text.
<script type = "text/javascript">
function extractText(node){
var all = "";
for (node=node.firstChild;node;node=node.nextSibling){
alert(node.nodeValue + " = " + node.nodeType);
if (node.nodeType == 3){
all += node.nodeValue
}
}
alert(all);
}
</script>
That is located in the head of an html document.
The body looks as such…
<body onload = "extractText(document.body)">
Stuff
<b>text</b>
<script>
var x = 1;
</script>
</body>
The problem is that the alert(all); only shows “Stuff”, and it adds a bunch of null things that I don’t really understand when doing the alert(node.nodeValue + " = " + node.nodeType);. It says null = 3 a few times. Could anyone tell me why this isn’t working properly? Thanks in advance.
If you want the text from the document, you may want to look in to a recursive call. However, if you don’t care about children, remove the first
if (node.hasChildNodes()){}condition in the following:Also, you probably want to grab
textContentovernodeValuebut that’s your call. You can also get more granular and test if thenodeNameis aSCRIPTand ignore if (if you so chose) but I’ll let you make that determination.Follow-Up: here’s a fiddle you can play with, with the
<script>test commented and optional whitespace removal: http://jsfiddle.net/KZuk5/2/