Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8764319
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T16:00:24+00:00 2026-06-13T16:00:24+00:00

I have some code that uses the Java Apache POI library to open a

  • 0

I have some code that uses the Java Apache POI library to open a Microsoft word document and convert it to html, using the the Apache POI and it also gets the byte array data of images on the document. But I need to convert this information to html to write out to an html file. Any hints or suggestions would be appreciated. Keep in mind that I am a desktop dev developer and not a web programmer, so when you make suggestions, please remember that. The code below gets the image.

 private void parseWordText(File file) throws IOException {
      FileInputStream fs = new FileInputStream(file);
      doc = new HWPFDocument(fs);
      PicturesTable picTable = doc.getPicturesTable();
      if (picTable != null){
           picList = new ArrayList<Picture>(picTable.getAllPictures());
           if (!picList.isEmpty()) {
           for (Picture pic : picList) {
                byte[] byteArray = pic.getContent();
                pic.suggestFileExtension();
                pic.suggestFullFileName();
                pic.suggestPictureType();
                pic.getStartOffset();
           }
        }
     }

Then the code below this converts the document to html. Is there a way to add the byteArray to the ByteArrayOutputStream in the code below?

private void convertWordDoctoHTML(File file) throws ParserConfigurationException, TransformerConfigurationException, TransformerException, IOException {
    HWPFDocumentCore wordDocument = null;
    try {
        wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream(file));
    } catch (IOException ex) {
        Exceptions.printStackTrace(ex);
    }

    WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
    wordToHtmlConverter.processDocument(wordDocument);
    org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
    NamedNodeMap node = htmlDocument.getAttributes();


    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DOMSource domSource = new DOMSource(htmlDocument);
    StreamResult streamResult = new StreamResult(out);

    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer serializer = tf.newTransformer();
    serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
    serializer.setOutputProperty(OutputKeys.METHOD, "html");
    serializer.transform(domSource, streamResult);
    out.close();

    String result = new String(out.toByteArray());
    acDocTextArea.setText(newDocText);

    htmlText = result;

}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T16:00:25+00:00Added an answer on June 13, 2026 at 4:00 pm

    Looking at the source code for the org.apache.poi.hwpf.converter.WordToHtmlConverter at

    http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/WordToHtmlConverter.java?view=markup&pathrev=1180740

    It states in the JavaDoc:

    This implementation doesn’t create images or links to them. This can be
    changed by overriding {@link #processImage(Element, boolean, Picture)} method

    If you take a look at that processImage(...) method in AbstractWordConverter.java at line 790, it looks like the method is calling then another method named processImageWithoutPicturesManager(...).

    http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/src/org/apache/poi/hwpf/converter/AbstractWordConverter.java?view=markup&pathrev=1180740

    This method is defined in WordToHtmlConverter again and looks suspiciously exact like the place you want to grow your code (line 317):

    @Override
    protected void processImageWithoutPicturesManager(Element currentBlock,
        boolean inlined, Picture picture)
    {
        // no default implementation -- skip
        currentBlock.appendChild(htmlDocumentFacade.document
        .createComment("Image link to '"
        + picture.suggestFullFileName() + "' can be here"));
    }
    

    I think you have the point where to start inserting the images into the flow.

    Create a subclass of the converter, e.g.

        public class InlineImageWordToHtmlConverter extends WordToHtmlConverter
    

    and then override the method and place whatever code into it.

    I haven’t tested it, but it should be the right way from what I see theoretically.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have an Android java library that uses some native code. I have added
I have some code that uses Open XML to open up a .docx file,
I have inherited a bit of Java code that uses Hibernate. Some of the
I have some old code (pre Java 1.5) that uses classes to implement type
I have some Java code that uses curly braces in two ways // Curly
I have some code that uses the SQL Server 2005 SMO objects to backup
I have some code that uses ODP.Net using (OracleConnection connection = new OracleConnection(connectionString)) {
I have some code here that uses bitsets to store many 1 bit values
I have some old code that uses qsort to sort an MFC CArray of
We have some code kicking around that uses this old internal Sun package for

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.