I’d like to fetch a webpage, just fetching the data (not parsing or rendering anything), just catch the data returned after a http request.
I’m trying to do this using the high-level Class Socket of the JavaRuntime Library.
I wonder if this is possible since I’m not at ease figuring out the beneath layer used for this two-point communication or I don’t know if the trouble is coming from my own system.
.
Here’s what my code is doing:
1) setting the socket.
this.socket = new Socket( "www.example.com", 80 );
2) setting the appropriate streams used for this communication.
this.out = new PrintWriter( socket.getOutputStream(), true);
this.in = new BufferedReader( new InputStreamReader( socket.getInputStream() ) );
3) requesting the page (and this is where I’m not sure it’s alright to do like this).
String query = "";
query += "GET / HTTP/1.1\r\n";
query += "Host: www.example.com\r\n";
...
query += "\r\n";
this.out.print(query);
4) reading the result (nothing in my case).
System.out.print( this.in.readLine() );
5) closing socket and streams.
If you’re on a *nix system, look into CURL, which allows you to retrieve information off the internet using the command line. More lightweight than a Java socket connection.
If you want to use Java, and are just retrieving information from a webpage, check out the Java URL library (java.net.URL). Some sample Java code:
That’ll grab the specified URL, grab the data (html in this case) and spit it out to the console. Might have to tweak the delimiter abit, but this will work with most network endpoints sending data.