I am trying to retrieve the html of a Google search query result using

Question

0

Asked: June 6, 20262026-06-06T06:53:00+00:00 2026-06-06T06:53:00+00:00

I am trying to retrieve the html of a Google search query result using

0

I am trying to retrieve the html of a Google search query result using Java. That is, if I do a search in Google.com for a particular phrase, I would like to retrieve the html of the resulting web page (the page containing the links to possible matches along with their descriptions, URLs, ect…).

I tried doing this using the following code that I found in a related post:

import java.io.*;
import java.net.*;
import java.util.*;

public class Main {

    public static void main (String args[]) {

        URL url;
        InputStream is = null;
        DataInputStream dis;
        String line;

        try {
            url = new URL("https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=UCF&oq=UCF&aq=f&aqi=g4&aql=&gs_l=hp.3..0l4.1066.1471.0.1862.3.3.0.0.0.0.382.1028.2-1j2.3.0...0.0.OxbV2LOXcaY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=579625c09319dd01&biw=944&bih=951");
            is = url.openStream();  // throws an IOException
            dis = new DataInputStream(new BufferedInputStream(is));

            while ((line = dis.readLine()) != null) {
                System.out.println(line);
            }
        } catch (MalformedURLException mue) {
             mue.printStackTrace();
        } catch (IOException ioe) {
             ioe.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException ioe ) {
                // nothing to see here
            }
        }
    }
}

From: How do you Programmatically Download a Webpage in Java

The URL used in this code was obtained by doing a Google search query from the Google homepage. For some reason I do not understand, if I write the phrase that I want to search for in the URL bar of my web browser and then use the URL of the resulting search result page in the code I get a 403 error.

This code, however, did not return the html of the search query result page. Instead, it returned the source code of the Google homepage.

After doing further research I noticed that if you view the source code of a Google search query result (by right clicking on the background of the search result page and selecting “View page source”) and compare it with the source code of the Google homepage, they are both identical.

If instead of viewing the source code of the search result page I save the html of the search result page (by pressing ctrl+s), I can get the html that I am looking for.

Is there a way to retrieve the html of the search result page using Java?

Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T06:53:01+00:00

Rather than parsing the resulting HTML page from a standard google search, perhaps you would be better off looking at the official Custom Search api to return results from Google in a more usable format. The API is definitely the way to go; otherwise your code could simply break if Google were to change some features of the google.com front-end’s html. The API is designed to be used by developers and your code would be far less fragile.

To answer your question, though: We can’t really help you just from the information you’ve provided. Your code seems to retrieve the html of stackoverflow; an exact copy-and-paste of the code from the question you linked to. Did you attempt to change the code at all? What URL are you actually trying to use to retrieve google search results?

I tried to run your code using url = new URL("http://www.google.com/search?q=test"); and I personally get an HTTP error 403 forbidden. A quick search of the problem says that this happens if I don’t provide the User-Agent header in the web request, though that doesn’t exactly help you if you’re actually getting HTML returned. You will have to provide more information if you wish to receive specific help – though switching to the Custom Search API will likely solve your problem.

edit: new information provided in original question; can directly answer question now!

I figured out your problem after packet-capturing the web request that java was sending and applying some basic debugging… Let’s take a look!

Here’s the web request that Java was sending with your provided example URL:

GET / HTTP/1.1
User-Agent: Java/1.6.0_30
Host: www.google.com
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive

Notice that the request seemed to ignore most of the URL… leaving just the “GET /”. That is strange. I had to look this one up.

As per the documentation of the Java URL class (and this is standard for all web pages), A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters ... This fragment is not technically part of the URL.

Let’s take a look at your example URL…

https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=UCF&oq=UCF&aq=f&aqi=g4&aql=&gs_l=hp.3..0l4.1066.1471.0.1862.3.3.0.0.0.0.382.1028.2-1j2.3.0...0.0.OxbV2LOXcaY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=579625c09319dd01&biw=944&bih=951

notice that “#” is the first character in the file path? Java is simply ignoring everything after the “#” because sharp-signs are only used by the client / web browser – this leaves you with the url https://www.google.com/. Hey, at least it was working as intended!

I can’t tell you exactly what Google is doing, but the sharp-symbol url definitely means that Google is returning results of the query through some client-side (ajax / javascript) scripting. I’d be willing to bet that any query you send directly to the server (i.e- no “#” symbol) without the proper headers will return a 403 forbidden error – looks like they’re encouraging you to use the API 🙂

edit2: As per Tengji Zhang answer to the question, here is working code that returns the result of the google query for “test”

    URL url;
    InputStream is = null;
    DataInputStream dis;
    String line;
    URLConnection c;

    try {
        url = new URL("https://www.google.com/search?q=test");
        c = url.openConnection();
        c.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168");
        c.connect();
        is = c.getInputStream();
        dis = new DataInputStream(new BufferedInputStream(is));
        while ((line = dis.readLine()) != null) {
            System.out.println(line);
        }
    } catch (MalformedURLException mue) {
         mue.printStackTrace();
    } catch (IOException ioe) {
         ioe.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException ioe ) {
            // nothing to see here
        }
    }

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to retrieve the html of a Google search query result using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply