I posted this on the Jsoup group at google.groups, but there doesn’t seem to be much activity there lately, so I’ll try here as well…
The following code
final String html = "<html><head></head><body><div></div></body></html>";
Document doc = Jsoup.parse(html);
Element body = doc.body();
Element div = body.select("div").first();
body.empty(); // <--- gives exception at line 56 below
// body.children().remove(); // does not give exception
body.appendChild(div); // line 56, IndexOutOfBoundsException here
with Jsoup 1.6.1, gives me an IndexOutOfBoundsException with the following (partial, top 7 lines) stack trace
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.remove(ArrayList.java:387)
at org.jsoup.nodes.Node.removeChild(Node.java:394)
at org.jsoup.nodes.Node.reparentChild(Node.java:420)
at org.jsoup.nodes.Node.addChildren(Node.java:402)
at org.jsoup.nodes.Element.appendChild(Element.java:225)
at webfilter.FilterY.<init>(FilterY.java:56)
here FilterY is my class containing the code above. If I use body.children().remove() instead of body.empty(), it works fine.
Question is… am I abusing Jsoup here, or is this really a bug?
Yes you are using the jSoup library in a wrong way. Lets go line by line:
You are saving a reference of first child:
You are removing all of the element’s child nodes:
And then you are trying to append the deleted child:
Now the problem is in the last step because when you deleted all the child elements then at that time the parent-child relationship is not broken because
body.empty();just calls theclear();method of thejava.util.Listand it does not break any parent-child linkage, which is not the case when you dobody.children().remove()and that’s why in the following code you are getting the mentioned exception as there are dangling reference to child node:But if you do something like this then it would work: