I’m trying to parse HTML files using an XML/HTML parser which contain hidden commented

Question

0

Asked: June 12, 20262026-06-12T21:19:21+00:00 2026-06-12T21:19:21+00:00

I’m trying to parse HTML files using an XML/HTML parser which contain hidden commented

0

I’m trying to parse HTML files using an XML/HTML parser which contain hidden commented text for translation, namely X and Y below.

<!-- Title: “ X ” Tags: “ Y ” -->

Which XPath would best match X and Y? The //comment() function matches the whole node but I need to match the two occurences of text between “ and ” quotes.

I guess one would need a combination of XPath and regular expressions to do that but I’m not sure how to tackle that.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T21:19:22+00:00

I assume that the quotes in the comment are the same, regular qoute character " — not the typographically different starting and ending quote that appears when this question is displayed.

In case this assumption is wrong, simply replace the standard quote in the below expressions with the respective quote.

Use (if the comment in question is the first one in the document):

substring-before(substring-after(//comment(), '"'), '"')

This produces the string (without the quotes):

” X “

And for the second string in quotes use:

substring-before(
   substring-after(
        substring-after(
               substring-after(//comment(), '"'), 
               '"'), 
        '"'), 
   '"')

XSLT – based verification (Because an XSLT stylesheet must be a well-formed XML document we replace the quotes in the expressions with the entity " — just to avoid errors due to nested quotes):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
     "<xsl:copy-of select="substring-before(substring-after(//comment(), '&quot;'), '&quot;')"/>"
=============
   "<xsl:copy-of select=
   "substring-before(substring-after(substring-after(substring-after(//comment(), '&quot;'), '&quot;'), '&quot;'), '&quot;')"/>"
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied against this XML document:

<html>
  <body>
    Hello.
<!-- Title: " X " Tags: " Y " -->
  </body>
</html>

the two XPath expressions are evaluated and the results of these two evaluations are copied to the output (surrounded by quotes to show the exact strings copied):

     " X "
=============
   " Y "

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse HTML files using an XML/HTML parser which contain hidden commented

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply