Jsoup has 2 html parse() methods: parse(String html) – As no base URI is

Question

0

Editorial Team

Asked: May 25, 20262026-05-25T01:18:46+00:00 2026-05-25T01:18:46+00:00

Jsoup has 2 html parse() methods: parse(String html) – As no base URI is

0

Jsoup has 2 html parse() methods:

parse(String html) – “As no base URI is specified, absolute URL
detection relies on the HTML including a tag.”
parse(String html, String baseUri) – “The URL where the HTML
was retrieved from. Used to resolve relative URLs to absolute URLs,
that occur before the HTML declares a tag.”

I am having a difficulty understanding the meaning of the difference between the two:

In the 2nd parse() version, what does “resolve relative URLs to absolute URLs, that occur
before the HTML declares a <base href> tag” mean? What if a
<base href> tag never occurs in the page?
What is the purpose of absolute URL detection? Why does Jsoup need
to find the absolute URL?
Lastly, but most importantly: Is baseUri the full URL of HTML page
(as phrased in original documentation) or is it the base URL of
the HTML page?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T01:18:46+00:00

It’s used for among others Element#absUrl() so that you can retrieve the (intended) absolute URL of an <a href>, <img src>, <link href>, <script src>, etc. E.g.

for (Element link : document.select("a")) {
    System.out.println(link.absUrl("href"));
}

This is very useful if you want to download and/or parse the linked resources as well.

In the 2nd parse() version, what does “resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag” mean? What if a <base href> tag never occurs in the page?

Some (poor) websites may have declared a <link> or <script> with a relative URL before the <base> tag. Or if there is no means of a <base> tag, then just the given baseUri will be used for resolving relative URLs of the entire document.

What is the purpose of absolute URL detection? Why does Jsoup need to find the absolute URL?

In order to return the right URL on Element#absUrl(). This is purely for enduser’s convenience. Jsoup doesn’t need it in order to successfully parse the HTML at its own.

Lastly, but most importantly: Is baseUri the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page?

The former. If the latter, then documentation would be lying. The baseUri must not to be confused with <base href>.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Jsoup has 2 html parse() methods: parse(String html) – As no base URI is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply