I’m trying to write a perl script that removes whitespace from XML tags, but

Question

0

Asked: May 24, 20262026-05-24T10:26:01+00:00 2026-05-24T10:26:01+00:00

I’m trying to write a perl script that removes whitespace from XML tags, but

0

I’m trying to write a perl script that removes whitespace from XML tags, but leaves whitespace inside of the values. For example, let’s say I have:

<Example>This is an example.</Exampl   e>

What I’m looking to accomplish is to knock off the whitespace specifically in </Exampl e>. Since this will be working on an entire XML document, I figured I’d do something with the substitution operator, but I can’t quite figure out how to only match whitespace that might be inside of the XML tags themselves.

Any help is greatly appreciated!

Edit: I’ve added a real example of what is occurring:

not well-formed (invalid token) at line 42, column 25, byte 1456:
                    <Artist>Eminem</Artist>
                    <FileName>eminem feat lil wayne - no love -
hotnewhiphop com(2).mp3</    FileName>
========================^
                    <FileSize>4804478</FileSize>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T10:26:02+00:00

s!(</?\w+)\s+(\w+\s+/?>)!$1$2!g;

If you want to actually leave whitespace in a tag with attributes, it gets more complex, because whitespace is a legitimate character in a tag. You pretty much have to find the “words” with no equals or space + equals after them and marry them to the previous–unquoted–word.

sub marry_inner_splits {
    my $_ = shift;
    # fix broken tags
    s|^/?(\w+)\s+(\w+)\b(?!\s*=)|$1$2|; 
    # find the resulting position.
    my $pos = index( $_, ' ' );
    # return if there is no whitespace.
    return $_ if $pos == -1;
    # bind the rest of the text to the substring
    substr( $_, $pos ) =~ s/(\s*\w+)\s+(\w+\s*=\s*(?:"[^"]+"|'[^']+')\s*)/$1$2/g;
    return $_;
}

my $tag_str = q{Some stuff before the tag <ta g attr1="val1" att   r2="value #2"     /></Escap   e>};
$tag_str =~ s/<([^>]+)>/'<' . marry_inner_splits($1) . '>'/ge;

The e flag means that you are*eval*-ing in the replacement part.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to write a perl script that removes whitespace from XML tags, but

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply