I currrently have some code that converts a .doc document to html but the

Question

0

Asked: June 13, 20262026-06-13T13:22:06+00:00 2026-06-13T13:22:06+00:00

I currrently have some code that converts a .doc document to html but the

0

I currrently have some code that converts a .doc document to html but the code I am using for converting a .docx to text unfortunately doesn’t get the text and convert it. Below is my code.

private void convertWordDocXtoHTML(File file) throws ParserConfigurationException, TransformerConfigurationException, TransformerException, IOException {
    XWPFDocument wordDocument = null;
    try {
        wordDocument = new XWPFDocument(new FileInputStream(file));
    } catch (IOException ex) {
        Exceptions.printStackTrace(ex);
    }

    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());
    acDocTextArea.setText(newDocText);
    String htmlText = result;

}

Any ideas as to why this isn’t working would be much appreciated. The ByteArrayOutput should return the entire html but it is empty and has no text.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T13:22:07+00:00

Mark, you’re using HWPF package which supports only .doc format, see this description. The document also mentions attempts to provide the interface for .docx files, through XWPF package. However they seem to lack human resources and users are encouraged to submit extensions. Limited functionality should be available though, extracting the text must be one of them.

You should also see this question: How to Extract docx (word 2007 above) using apache POI.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I currrently have some code that converts a .doc document to html but the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply