We are migrating from one content system into another and have tons of HTML

Question

0

Asked: May 30, 20262026-05-30T02:51:46+00:00 2026-05-30T02:51:46+00:00

We are migrating from one content system into another and have tons of HTML

0

We are migrating from one content system into another and have tons of HTML where there are lines, for example, like this:

<p style="text-align: justify;"><i> </i></p>

I am looking for a way to strip HTML with Python where there is no text output to the screen. So a line similar to this would be stripped.

And, this is just one of MANY examples of lines where there is no text output. So, I would need to find them all to strip. I don’t have to worry about images, movies, etc. since only text was possible in our old content management system.

BTW, the vast majority of the lines either start with a p tag or a div tag (ignoring leading whitespace).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T02:51:48+00:00

In case the HTML is also a well-formed XML document (This can be done in a pre-pass with a tool like HTML-Tidy), this transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*[not(normalize-space(.))]"/>
</xsl:stylesheet>

when applied on any such XML document — for example:

<html>
 <body>
   Welcome.
   <p style="text-align: justify;"><i> </i></p>
 </body>
</html>

produces the wanted result in which any element whose string value is empty or is all whitespace, is deleted:

<html>

   <body>
      Welcome.


   </body>

</html>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We are migrating from one content system into another and have tons of HTML

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply