I’m experimenting with XSLT2, using a stylesheet based on this answer:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="source/text()">
<xsl:sequence select="replace(., '<.*?>', '<ph>$0</ph>')"/>
</xsl:template>
</xsl:stylesheet>
which is intended to do multiple replacements, eg from:
<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
<file>
<source>abc <field1> def <field2> ghi</source>
</file>
</xliff>
to:
<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:xliff="urn:oasis:names:tc:xliff:document:1.1" version="1.1">
<file>
<source>abc <ph><field1></ph> def <ph><field2></ph> ghi</source>
</file>
</xliff>
However my transform is not valid, I get this error:
Error on line 12 column 54 of my.xsl:
SXXP0003: Error reported by XML parser: The value of attribute "select" associated with an
element type "null" must not contain the '<' character.
If I use select="replace(., '<(.*?)>', '<ph>F</phgt;')" then I get ...<ph>... in the output.
If I use DOE I introduce other problems because there might me other entities in the field I want to leave untouched. If I use <xsl:output method="text"/> I lose most of my xml – is there some other way of ‘mixing and matching’ like this?
The problem is here:
A well-formed XML document cannot contain the
<character in an attribute value.In this particular case, the
selectattribute above contains the substring<ph>F</ph>and this causes the stylesheet even not to be parsed as an XML document.And, more importantly, elements cannot be generated just by string replacement — the result will be just string (containing encoded element representation) — not element.
Here is how to achieve what you want:
when this transformation is applied on the provided XML document:
the wanted result is produced:
Explanation: Appropriate use of the XSLT 2.0 instructions
<xsl:analyze-string>,<xsl:matching-substring>,<xsl:non-matching-substring>andregex-group()