If you can politically: Estimate in small chunks, work in…

Question

0

Asked: May 11, 20262026-05-11T11:46:34+00:00 2026-05-11T11:46:34+00:00

I’m trying to remove unused spans (i.e. those with no attribute) from HTML files,

0

I’m trying to remove unused spans (i.e. those with no attribute) from HTML files, having already cleaned up all the attributes I didn’t want with other regular expressions.

I’m having a problem with my regex not picking the correct pair of start and end tags to remove.

my $a = 'a <span>b <span style='color:red;'>c</span> d</span>e'; $a =~ s/<span\s*>(.*?)<\/span>/$1/g; print '$a\

returns

a b <span style='color:red;'>c d</span>e

but I want it to return

a b <span style='color:red;'>c</span> de

Help appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T11:46:35+00:00

Try HTML::Parser:

#!/usr/bin/perl  use strict; use warnings;  use HTML::Parser;  my @print_span; my $p = HTML::Parser->new(   start_h   => [ sub {     my ($text, $name, $attr) = @_;     if ( $name eq 'span' ) {       my $print_tag = %$attr;       push @print_span, $print_tag;       return if !$print_tag;     }     print $text;   }, 'text,tagname,attr'],   end_h => [ sub {     my ($text, $name) = @_;     if ( $name eq 'span' ) {       return if !pop @print_span;     }     print $text;   }, 'text,tagname'],   default_h => [ sub { print shift }, 'text'], ); $p->parse_file(\*DATA) or die 'Err: $!'; $p->eof;  __END__ <html> <head> <title>This is a title</title> </head> <body> <h1>This is a header</h1> a <span>b <span style='color:red;'>c</span> d</span>e </body> </html>

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions