I want to use java to retrieve text from a website. I can easily

Question

0

Asked: May 28, 20262026-05-28T19:34:35+00:00 2026-05-28T19:34:35+00:00

I want to use java to retrieve text from a website. I can easily

0

I want to use java to retrieve text from a website. I can easily get the source by doing: (Thank you random internet person who posted this somewhere else)

    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.net.URL;
    import java.net.URLConnection;

    public class WebCrawler{
        public static void main(String[] args) {
            try {
                URL google = new URL("http://stackoverflow.com");
                URLConnection yc = google.openConnection();
                BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
                String inputLine;
                while ((inputLine = in.readLine()) != null) {
                    System.out.println(inputLine);

                }
                in.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

However this leaves me with the problem of some sites returning 403s. Is there a way of getting around this?

Very simply I was hoping to use java to create a simple bot that would scan a forum thread and automatically respond based of user queries. Am I able to do this in java? or do I need to look at it from the perspective of another language/ data retrieval method?

Thank you for your time.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T19:34:36+00:00

Yes, this can be done in Java. In theory, anything a web browser can do, Java can do – since, in the very worst case, you could write a web browser in Java.

A 403 is a “forbidden” response. You may need to set a particular User-Agent header, or the site might require HTTP basic authentication. Or perhaps it’s rate-limiting you and you need to see about obeying their robots.txt rules…

Java is certainly not (in my opinion) the easiest language in which to write this type of code, but you’re on a decent track here.

As for your “not source” in the title – the source of a web page is its text. If you download the page, you’re going to get HTML; it’s up to you to parse out what you need and discard the dross.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to use java to retrieve text from a website. I can easily

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply