I want to do some basic scripting and I’m trying to do it in javascript. I want to basically download a wikiquote page and scrape it.
What’s the best way to do this? How do I get the page? I tried to do it via jQuery.get()
$.get('http://en.wikiquote.org/wiki/Last_words', function(data) { console.log(data); })
But the log is simply some error object and the console displays
XMLHttpRequest cannot load
http://en.wikiquote.org/wiki/Last_words.
Origin null is not allowed by
Access-Control-Allow-Origin.
en.wikiquote.org/wiki/Last_wordsGET http://en.wikiquote.org/wiki/Last_words
undefined (undefined)
So I guess I’m not taking the correct approach. What should I be doing?
Also, once I DO download the file, what tools are available for me to traverse it? XPath? RegEx? Is there a way to generate a DOM model from it and attach jquery?
An interesting possibility would be to somehow just open a tiny pop-up which downloads the page and then run my script to scrape the page and return data. I am aware this sounds lot like script injection. Is it even possible to do this in a friendly manner?
Assuming you are limiting yourself to JavaScript running in the browser, and documents that are not on the same host as the page running the script — you can’t.
The Same Origin security policy makes this impossible. Without it a webpage could request data from any site (including LAN sites) that the user can access, with their ip address, their cookies, and anything else that might be used for authentication. (All your banking are belong to us).