I am using a query like this in jSoup: Document doc = Jsoup.connect(urlString).timeout(1000).post(); It

Question

0

Asked: May 26, 20262026-05-26T08:17:48+00:00 2026-05-26T08:17:48+00:00

I am using a query like this in jSoup: Document doc = Jsoup.connect(urlString).timeout(1000).post(); It

0

I am using a query like this in jSoup:

Document doc = Jsoup.connect(urlString).timeout(1000).post();

It works for some sites, however:

it doesn’t work for Google search queries (e.g. urlString = “http://www.google.com/search?q=text”) – I don’t know why, how it is special
result documents contain messages like “JavaScript should be turned on in your browser” which I would rather avoid
there are probably more quirks, but I haven’t tested it fully yet…

My question: could these problems be avoided if we could mimic a web browser more closely? What is the best way to do it?

What are the other differences that can be encountered between getting pages via web browser and via Java (URLConnection or jSoup)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T08:17:49+00:00

I realized that the problem with some sites not responding was actually that I was using post() instead of get(). With get() it works fine now!

It also probably helps to add userAgent to the query, for example:

.userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")

In the meantime, I’ve also tested HtmlUnit for the same task, and it worked, but it seems like an overkill for the purpose to simply get an HTML file (for some kind of processing). It basically runs a whole invisible web browser in the background to do this task.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using a query like this in jSoup: Document doc = Jsoup.connect(urlString).timeout(1000).post(); It

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply