I’m using Jsoup for parsing short html document that contains some custom tags needed for some logic operations on the result
Like this:
<table><showif field="xxx"><tr><td>test</test></td></tr></showif><tr><td>xyz</td></tr></table>
Document doc = Jsoup.parse(html);
Elements showif_fields = doc.select("SHOWIF[field]");
in this case the inner content seems lost, the outerHtml() method shows just this:
<showif value="xxx"></showif>
but if the “showif” tag contains a simple text like hello, it works as expected.
Any ideas?
Thank you.
The issue you are bumping into is that the HTML spec for table content is pretty strict, and so your unknown tags are getting fostered outside of the table. (Jsoup does this to match the HTML spec, so that it matches browser behaviour as closely as possible.)
In this case, you know what you’re doing and you’re creating the HTML, so you can set jsoup to ignore the HTML spec and just process the tags as it sees them. Do this with the XML parser: