I would like to create a program that will enter a string into the text box on a site like Google (without using their public API) and then submit the form and grab the results. Is this possible? Grabbing the results will require the use of HTML scraping I would assume, but how would I enter data into the text field and submit the form? Would I be forced to use a public API? Is something like this just not feasible? Would I have to figure out query strings/parameters?
Thanks
Theory
What I would do is create a little program that can automatically submit any form data to any place and come back with the results. This is easy to do in Java with HTTPUnit. The task goes like this:
The solution you pick will depend on a variety of factors, including:
For example, you could try the following applications to submit the data for you:
Then grep (awk, or sed) the resulting web page(s).
Another trick when screen scraping is to download a sample HTML file and parse it manually in vi (or VIM). Save the keystrokes to a file and then whenever you run the query, apply those keystrokes to the resulting web page(s) to extract the data. This solution is not maintainable, nor 100% reliable (but screen scraping from a website seldom is). It works and is fast.
Example
A semi-generic Java class to submit website forms (specifically dealing with logging into a website) is below, in the hopes that it might be useful. Do not use it for evil.
An example properties files would look like:
Run it similar to the following (substitute the path to HTTPUnit and the FormElements class for $CLASSPATH):
Legality
Another answer mentioned that it might violate terms of use. Check into that first, before you spend any time looking into a technical solution. Extremely good advice.