I was under the impression that the most costly method in Jsoup’s API is

Question

0

Editorial Team

Asked: June 7, 20262026-06-07T01:25:39+00:00 2026-06-07T01:25:39+00:00

I was under the impression that the most costly method in Jsoup’s API is

0

I was under the impression that the most costly method in Jsoup’s API is parse().

But I just discovered that Document.html() could be even slower.

Given that the Document is the output of parse() (i.e. this is after parsing), I find this surprising.

Why is Document.html() so slow?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T01:25:41+00:00

Answering myself. The Element.html() method is implemented as:

public String html() {
  StringBuilder accum = new StringBuilder();
  html(accum); 
  return accum.toString().trim();
}

Using StringBuilder instead of String is already a good thing, and the use of StringBuilder.toString() and String.trim() may not explain the slowness of Document.html(), even for a relatively large document.

But in the middle, our method calls an overloaded version, Element.html(StringBuilder) which loops through all child nodes in the document:

private void html(StringBuilder accum) {
  for (Node node : childNodes)
    node.outerHtml(accum);
}

Thus if the document contains lots of child nodes, it will be slow.

It would be interesting to see whether there could be a faster implementation of this.

For example, if Jsoup stores a cached version of the raw html that was provided to it via Jsoup.parse(). As an option of course, to maintain backward compatibility and small footprint in memory.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was under the impression that the most costly method in Jsoup’s API is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply