I am making a crawler, and need to get the data from the stream

Question

0

Asked: May 29, 20262026-05-29T09:27:20+00:00 2026-05-29T09:27:20+00:00

I am making a crawler, and need to get the data from the stream

0

I am making a crawler, and need to get the data from the stream regardless if it is a 200 or not. CURL is doing it, as well as any standard browser.

The following will not actually get the content of the request, even though there is some, an exception is thrown with the http error status code. I want the output regardless, is there a way? I prefer to use this library as it will actually do persistent connections, which is perfect for the type of crawling I am doing.

package test;

import java.net.*;
import java.io.*;

public class Test {

    public static void main(String[] args) {

         try {

            URL url = new URL("http://github.com/XXXXXXXXXXXXXX");
            URLConnection connection = url.openConnection();

            DataInputStream inStream = new DataInputStream(connection.getInputStream());
            String inputLine;

            while ((inputLine = inStream.readLine()) != null) {
                System.out.println(inputLine);
            }
            inStream.close();
        } catch (MalformedURLException me) {
            System.err.println("MalformedURLException: " + me);
        } catch (IOException ioe) {
            System.err.println("IOException: " + ioe);
        }
    }
}

Worked, thanks: Here is what I came up with – just as a rough proof of concept:

import java.net.*;
import java.io.*;

public class Test {

    public static void main(String[] args) {
//InputStream error = ((HttpURLConnection) connection).getErrorStream();

        URL url = null;
        URLConnection connection = null;
        String inputLine = "";

        try {

            url = new URL("http://verelo.com/asdfrwdfgdg");
            connection = url.openConnection();

            DataInputStream inStream = new DataInputStream(connection.getInputStream());

            while ((inputLine = inStream.readLine()) != null) {
                System.out.println(inputLine);
            }
            inStream.close();
        } catch (MalformedURLException me) {
            System.err.println("MalformedURLException: " + me);
        } catch (IOException ioe) {
            System.err.println("IOException: " + ioe);

            InputStream error = ((HttpURLConnection) connection).getErrorStream();

            try {
                int data = error.read();
                while (data != -1) {
                    //do something with data...
                    //System.out.println(data);
                    inputLine = inputLine + (char)data;
                    data = error.read();
                    //inputLine = inputLine + (char)data;
                }
                error.close();
            } catch (Exception ex) {
                try {
                    if (error != null) {
                        error.close();
                    }
                } catch (Exception e) {

                }
            }
        }

        System.out.println(inputLine);
    }
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T09:27:21+00:00

Simple:

URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
if (connection instanceof HttpURLConnection) {
   HttpURLConnection httpConn = (HttpURLConnection) connection;
   int statusCode = httpConn.getResponseCode();
   if (statusCode != 200 /* or statusCode >= 200 && statusCode < 300 */) {
     is = httpConn.getErrorStream();
   }
}

You can refer to Javadoc for explanation. The best way I would handle this is as follows:

URLConnection connection = url.openConnection();
InputStream is = null;
try {
    is = connection.getInputStream();
} catch (IOException ioe) {
    if (connection instanceof HttpURLConnection) {
        HttpURLConnection httpConn = (HttpURLConnection) connection;
        int statusCode = httpConn.getResponseCode();
        if (statusCode != 200) {
            is = httpConn.getErrorStream();
        }
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am making a crawler, and need to get the data from the stream

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply