In OOXML, formatting such as bold, italic, etc. can be (and often annoyingly is)

Question

0

Asked: May 23, 20262026-05-23T04:15:22+00:00 2026-05-23T04:15:22+00:00

In OOXML, formatting such as bold, italic, etc. can be (and often annoyingly is)

0

In OOXML, formatting such as bold, italic, etc. can be (and often annoyingly is) split up between multiple elements, like so:

<w:p>
    <w:r>
        <w:rPr>
            <w:b/>
         </w:rPr>
         <w:t xml:space="preserve">This is a </w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t xml:space="preserve">bold </w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
            <w:i/>
        </w:rPr>
        <w:t>with a bit of italic</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t xml:space="preserve"> </w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t>paragr</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t>a</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t>ph</w:t>
    </w:r>
    <w:r>
        <w:t xml:space="preserve"> with some non-bold in it too.</w:t>
    </w:r>
</w:p>

I need to combine these formatting elements to produce this:

<p><b>This is a mostly bold <i>with a bit of italic</i> paragraph</b> with some non-bold in it too.</p>

My initial approach was going to be to write out the start formatting tag when it is first encountered using:

 <xsl:text disable-output-escaping="yes">&lt;b&gt;</xsl:text>

And then after I process each <w:r>, check the next one to see if the formatting is still present. If it’s not, add the end tag in the same way I add the start tag.
I keep thinking there must be a better way to do this, and I’d be grateful for any suggestions.

Should also mention that I am tied to XSLT 1.0.

The reason for needing this, is that we need to compare an XML file before it is transformed into OOXML, and after it is transformed out of OOXML. The extra formatting tags make it appear as though changes were made when they were not.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T04:15:23+00:00

Here is a complete XSLT 1.0 solution:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common" xmlns:w="w"
 exclude-result-prefixes="ext w">
 <xsl:output omit-xml-declaration="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="w:p">
  <xsl:variable name="vrtfPass1">
   <p>
    <xsl:apply-templates/>
   </p>
  </xsl:variable>

  <xsl:apply-templates mode="pass2"
   select="ext:node-set($vrtfPass1)/*"/>
 </xsl:template>

 <xsl:template match="w:r">
  <xsl:variable name="vrtfProps">
   <xsl:for-each select="w:rPr/*">
    <xsl:sort select="local-name()"/>
    <xsl:copy-of select="."/>
   </xsl:for-each>
  </xsl:variable>

  <xsl:call-template name="toHtml">
   <xsl:with-param name="pProps" select=
       "ext:node-set($vrtfProps)/*"/>
   <xsl:with-param name="pText" select="w:t/text()"/>
  </xsl:call-template>
 </xsl:template>

 <xsl:template name="toHtml">
  <xsl:param name="pProps"/>
  <xsl:param name="pText"/>

  <xsl:choose>
   <xsl:when test="not($pProps)">
     <xsl:copy-of select="$pText"/>
   </xsl:when>
   <xsl:otherwise>
    <xsl:element name="{local-name($pProps[1])}">
      <xsl:call-template name="toHtml">
        <xsl:with-param name="pProps" select=
            "$pProps[position()>1]"/>
        <xsl:with-param name="pText" select="$pText"/>
      </xsl:call-template>
    </xsl:element>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>

  <xsl:template match="/*" mode="pass2">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:call-template name="processInner">
     <xsl:with-param name="pNodes" select="node()"/>
    </xsl:call-template>
  </xsl:copy>
 </xsl:template>

 <xsl:template name="processInner">
  <xsl:param name="pNodes"/>

  <xsl:variable name="pNode1" select="$pNodes[1]"/>

  <xsl:if test="$pNode1">
   <xsl:choose>
    <xsl:when test="not($pNode1/self::*)">
     <xsl:copy-of select="$pNode1"/>
     <xsl:call-template name="processInner">
      <xsl:with-param name="pNodes" select=
      "$pNodes[position()>1]"/>
     </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="vbatchLength">
        <xsl:call-template name="getBatchLength">
         <xsl:with-param name="pNodes"
              select="$pNodes[position()>1]"/>
         <xsl:with-param name="pName"
             select="name($pNode1)"/>
         <xsl:with-param name="pCount" select="1"/>
        </xsl:call-template>
      </xsl:variable>

      <xsl:element name="{name($pNode1)}">
        <xsl:copy-of select="@*"/>

        <xsl:call-template name="processInner">
         <xsl:with-param name="pNodes" select=
         "$pNodes[not(position()>$vbatchLength)]
                        /node()"/>
        </xsl:call-template>
      </xsl:element>

      <xsl:call-template name="processInner">
       <xsl:with-param name="pNodes" select=
       "$pNodes[position()>$vbatchLength]"/>
      </xsl:call-template>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:if>
 </xsl:template>

 <xsl:template name="getBatchLength">
  <xsl:param name="pNodes"/>
  <xsl:param name="pName"/>
  <xsl:param name="pCount"/>

  <xsl:choose>
   <xsl:when test=
   "not($pNodes) or not($pNodes[1]/self::*)
    or not(name($pNodes[1])=$pName)">
   <xsl:value-of select="$pCount"/>
   </xsl:when>
   <xsl:otherwise>
    <xsl:call-template name="getBatchLength">
     <xsl:with-param name="pNodes" select=
         "$pNodes[position()>1]"/>
     <xsl:with-param name="pName" select="$pName"/>
     <xsl:with-param name="pCount" select="$pCount+1"/>
    </xsl:call-template>
   </xsl:otherwise>
  </xsl:choose>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied to the following XML document (based on the provided, but made more complicated to show how more edge-cases are covered):

<w:p xmlns:w="w">
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t xml:space="preserve">This is a </w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t xml:space="preserve">bold </w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
            <w:i/>
        </w:rPr>
        <w:t>with a bit of italic</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
            <w:i/>
        </w:rPr>
        <w:t> and some more italic</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:i/>
        </w:rPr>
        <w:t> and just italic, no-bold</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t xml:space="preserve"></w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t>paragr</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t>a</w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b/>
        </w:rPr>
        <w:t>ph</w:t>
    </w:r>
    <w:r>
        <w:t xml:space="preserve"> with some non-bold in it too.</w:t>
    </w:r>
</w:p>

the wanted, correct result is produced:

<p><b>This is a bold <i>with a bit of italic and some more italic</i></b><i> and just italic, no-bold</i><b>paragraph</b> with some non-bold in it too.</p>

Explanation:

This is a two-pass transformation. The first pass is relatively simple and converts the source XML document (in our specific case) to the following:

pass1 result (indented for readability):

<p>
   <b>This is a </b>
   <b>bold </b>
   <b>
      <i>with a bit of italic</i>
   </b>
   <b>
      <i> and some more italic</i>
   </b>
   <i> and just italic, no-bold</i>
   <b/>
   <b>paragr</b>
   <b>a</b>
   <b>ph</b> with some non-bold in it too.</p>

.2. The second pass (executed in mode "pass2") merges any batch of consecutive and identically named elements into a single element with that name. It recursively calls-itself on the children of the merged elements — thus batches at any depth are merged.

.3. Do note: We do not (and cannot) use the axes following-sibling:: or preceding-sibling, because only the nodes (to be merged) at the top level are really siblings. Due to this reason we process all nodes just as a node-set.

.4. This solution is completely generic — it merges any batch of consecutive identically-named elements at any depth — and no specific names are hardcoded.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In OOXML, formatting such as bold, italic, etc. can be (and often annoyingly is)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply