I want to make an application which can get different websites and extract data from their DOM. I intend to use XMLHttpRequest in Google Web Toolkit to make it. However, the XMLHttpRequest seems not to work because of the same-origin policy.
I wonder if there exist any other application framework which support DOM parsing and cross-site Ajax-like feature?
From other domains you can only download JSON data. HTML or XML DOM won’t be accessible due to security. In this case I see two options:
You will issue request to page origin server that will proxy the request to actual web server and therefore bypass the security restriction.
You can use some service to convert HTML or XML to JSON. I know just http://open.dapper.net/ that does that, but you need to first manually define records in the page, so it would work only with predefined set of pages, not URL that for instance a user is entering. But maybe there are XML to JSON converters that can convert any given URL. It wouldn’t be difficult to do.