Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8125637
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T06:53:00+00:00 2026-06-06T06:53:00+00:00

I am trying to retrieve the html of a Google search query result using

  • 0

I am trying to retrieve the html of a Google search query result using Java. That is, if I do a search in Google.com for a particular phrase, I would like to retrieve the html of the resulting web page (the page containing the links to possible matches along with their descriptions, URLs, ect…).

I tried doing this using the following code that I found in a related post:

import java.io.*;
import java.net.*;
import java.util.*;

public class Main {

    public static void main (String args[]) {

        URL url;
        InputStream is = null;
        DataInputStream dis;
        String line;

        try {
            url = new URL("https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=UCF&oq=UCF&aq=f&aqi=g4&aql=&gs_l=hp.3..0l4.1066.1471.0.1862.3.3.0.0.0.0.382.1028.2-1j2.3.0...0.0.OxbV2LOXcaY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=579625c09319dd01&biw=944&bih=951");
            is = url.openStream();  // throws an IOException
            dis = new DataInputStream(new BufferedInputStream(is));

            while ((line = dis.readLine()) != null) {
                System.out.println(line);
            }
        } catch (MalformedURLException mue) {
             mue.printStackTrace();
        } catch (IOException ioe) {
             ioe.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException ioe ) {
                // nothing to see here
            }
        }
    }
} 

From: How do you Programmatically Download a Webpage in Java

The URL used in this code was obtained by doing a Google search query from the Google homepage. For some reason I do not understand, if I write the phrase that I want to search for in the URL bar of my web browser and then use the URL of the resulting search result page in the code I get a 403 error.

This code, however, did not return the html of the search query result page. Instead, it returned the source code of the Google homepage.

After doing further research I noticed that if you view the source code of a Google search query result (by right clicking on the background of the search result page and selecting “View page source”) and compare it with the source code of the Google homepage, they are both identical.

If instead of viewing the source code of the search result page I save the html of the search result page (by pressing ctrl+s), I can get the html that I am looking for.

Is there a way to retrieve the html of the search result page using Java?

Thank you!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T06:53:01+00:00Added an answer on June 6, 2026 at 6:53 am

    Rather than parsing the resulting HTML page from a standard google search, perhaps you would be better off looking at the official Custom Search api to return results from Google in a more usable format. The API is definitely the way to go; otherwise your code could simply break if Google were to change some features of the google.com front-end’s html. The API is designed to be used by developers and your code would be far less fragile.

    To answer your question, though: We can’t really help you just from the information you’ve provided. Your code seems to retrieve the html of stackoverflow; an exact copy-and-paste of the code from the question you linked to. Did you attempt to change the code at all? What URL are you actually trying to use to retrieve google search results?

    I tried to run your code using url = new URL("http://www.google.com/search?q=test"); and I personally get an HTTP error 403 forbidden. A quick search of the problem says that this happens if I don’t provide the User-Agent header in the web request, though that doesn’t exactly help you if you’re actually getting HTML returned. You will have to provide more information if you wish to receive specific help – though switching to the Custom Search API will likely solve your problem.


    edit: new information provided in original question; can directly answer question now!

    I figured out your problem after packet-capturing the web request that java was sending and applying some basic debugging… Let’s take a look!

    Here’s the web request that Java was sending with your provided example URL:

    GET / HTTP/1.1
    User-Agent: Java/1.6.0_30
    Host: www.google.com
    Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
    Connection: keep-alive
    

    Notice that the request seemed to ignore most of the URL… leaving just the “GET /”. That is strange. I had to look this one up.

    As per the documentation of the Java URL class (and this is standard for all web pages), A URL may have appended to it a "fragment", also known as a "ref" or a "reference". The fragment is indicated by the sharp sign character "#" followed by more characters ... This fragment is not technically part of the URL.

    Let’s take a look at your example URL…

    https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=UCF&oq=UCF&aq=f&aqi=g4&aql=&gs_l=hp.3..0l4.1066.1471.0.1862.3.3.0.0.0.0.382.1028.2-1j2.3.0...0.0.OxbV2LOXcaY&pbx=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.,cf.osb&fp=579625c09319dd01&biw=944&bih=951

    notice that “#” is the first character in the file path? Java is simply ignoring everything after the “#” because sharp-signs are only used by the client / web browser – this leaves you with the url https://www.google.com/. Hey, at least it was working as intended!

    I can’t tell you exactly what Google is doing, but the sharp-symbol url definitely means that Google is returning results of the query through some client-side (ajax / javascript) scripting. I’d be willing to bet that any query you send directly to the server (i.e- no “#” symbol) without the proper headers will return a 403 forbidden error – looks like they’re encouraging you to use the API 🙂

    edit2: As per Tengji Zhang answer to the question, here is working code that returns the result of the google query for “test”

        URL url;
        InputStream is = null;
        DataInputStream dis;
        String line;
        URLConnection c;
    
        try {
            url = new URL("https://www.google.com/search?q=test");
            c = url.openConnection();
            c.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168");
            c.connect();
            is = c.getInputStream();
            dis = new DataInputStream(new BufferedInputStream(is));
            while ((line = dis.readLine()) != null) {
                System.out.println(line);
            }
        } catch (MalformedURLException mue) {
             mue.printStackTrace();
        } catch (IOException ioe) {
             ioe.printStackTrace();
        } finally {
            try {
                is.close();
            } catch (IOException ioe ) {
                // nothing to see here
            }
        }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to retrieve place details using the Google Places Library. When you click
I've been trying to retrieve a page of HTML using pycurl, so I can
I'm trying to retrieve images from any site that has images I'm using simplehtmldom
I am trying to retrieve the text data from an ePub file using Java.
I've been trying to retrieve Google analytics reports using their provided .NET api and
I'm trying to retrieve this page using Apache HttpClient: http://quick-dish.tablespoon.com/ Unfortunately, when I try
I am trying to get my Django app (NOT using Google app engine) retrieve
I am trying to retrieve some HTML files from Amazon s3 using AWS SDK
I am trying to retrieve the full HTML of a page, so far I
I am trying to retrieve a data attribute of type Boolean from an html

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.