I’m wondering if anyone has any experience retrieving elements by id for a given string of html in a performant way. I’m writing a method that takes two arguments:
public String getFilteredHtml(String html, Set<String> ids)
The method will return the concatenated html of elements matching the ids passed in. Currently I am using JSoup to accomplish this by parsing the html into a document, and either looping through the ids and appending the result of document.getElementById, or using a selector that looks like [id=id1],[id=id2] etc, which works fine with comparable performance between the two.
However I couldn’t help but notice that if while parsing an html document, a map containing String id > Element was kept the lookup would much be faster. Does anyone know of a library that has this functionality, or a way to go about implementing it myself? Or any other ways that might accomplish this faster?
You could use a SAX based HTML parser and build your map while the document is getting parsed. Something like NekoHTML or TagSoup for example. I’m not sure how much faster it would be though, you’d have to benchmark.