I’m new to Jsoup and I’m trying to parse an html-file to find all the elements without an id. Until now I only have this code snippet:
Document doc = Jsoup.parse(input, null);
for (Element el : doc.getAllElements()) {
hasId = el.hasAttr("id");
if (!hasId) {
idList.add(el.tagName());
} else {
log.info("id:" + el.attr("id"));
}
}
The elements with an id are found correctly. My problem is that I only want to scan the start elements if they have an id. Can I handle this with Jsoup?
I’m not sure if I’m understanding your question correctly, but I think you just want to select all elements that don’t have an
idattribute. If so, this should work:There’s a full list of selectors on the jsoup website.
Update:
Here’s a full example:
Running the above on my machine gives me this output:
table tbody tr tdNotice that I changed the query slightly:
"body *:not([id])". Addingbodyat the front excludes the<html><title></title><body> ... </body></html>that Jsoup automatically adds when parsing the partial document in thedatastring.