I’ve been learning about iText and its beauty for the pass few days. I

Question

0

Asked: June 18, 20262026-06-18T16:05:08+00:00 2026-06-18T16:05:08+00:00

I’ve been learning about iText and its beauty for the pass few days. I

0

I’ve been learning about iText and its beauty for the pass few days.

I manage to convert HTML source code to PDF successfully. However, I’ve been wondering if its possible to convert broken html (missing tags, etc) to PDF without XMLWorker throwing an exception just like HTMLWorker used to do. I know XMLWorker is very sensible and only works with correctly written HTML or (X)HTML but since I am getting the html from a second party which most likely will have broken HTML.

I would like to know if there is a way to just convert what’s possible and leave the errors floating around just like a browser would do.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T16:05:09+00:00

Use TagSoup before passing the broken HTML to iText. It will clean up the broken HTML and return valid X(HT)ML.

TagSoup implements the SAX parser interface. There are some examples on how to use it, but it lacks some “real” documentation.

Probably you will have to serialize the XML again and dump it to a file to feed it to iText, I don’t know its interface.

Serializing a SAX stream is possible using XMLWriter. By chance it is already included with TagSoup, so you don’t need to add an extra dependency.

final Parser parser = new Parser();
final StringWriter writer = new StringWriter();

parser.setContentHandler(new XMLWriter(writer));
parser.parse(new InputSource(
        new URL("http://oregonstate.edu/instruct/phl302/texts/hobbes/leviathan-c.html")
                .openConnection().getInputStream()));
System.out.println(writer.toString());

Decide based on iText’s API whether to dump writer‘s output to a file or pass it another way.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve been learning about iText and its beauty for the pass few days. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply