If you look at the Console class in .NET Reflector,…

Question

0

Asked: May 12, 20262026-05-12T11:20:18+00:00 2026-05-12T11:20:18+00:00

I need to strip all xml tags from an xml document, but keep the

0

I need to strip all xml tags from an xml document, but keep the space the tags occupy, so that the textual content stays at the same offsets as in the xml. This needs to be done in Java, and I thought RegExp would be the way to go, but I have found no simple way to get the length of the tags that match my regular expression.

Basically what I want is this:

Pattern p = Pattern.compile("<[^>]+>[^<]*]+>"); 
Matcher m = p.matcher(stringWithXMLContent); 
String strippedContent = m.replaceAll("THIS IS A STRING OF WHITESPACES IN THE LENGTH OF THE MATCHED TAG");

Hope somebody can help me to do this in a simple way!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T11:20:19+00:00

Pattern p = Pattern.compile("<[^>]+>[^<]*]+>");

In the spirit of You Can’t Parse XML With Regexp, you do know that’s not an adequate pattern for arbitrary XML, right? (It’s perfectly valid to have a > character in an attribute value, for example, not to mention other non-tag constructs.)

I have found no simple way to get the length of the tags that match my regular expression.

Instead of using replaceAll, repeatedly call find on the Matcher. You can then read start/end to get the indexes to replace, or use the appendReplacement method on a buffer. eg.

StringBuffer b= new StringBuffer();
while (m.find()) {
    String spaces= StringUtils.repeat(" ", m.end()-m.start());
    m.appendReplacement(b, spaces);
}
m.appendTail(b);
stringWithXMLContent= b.toString();

(StringUtils comes from Apache Commons. For more background and library-free alternatives see this question.)

How to approach applying for a job at a company ...

What is a programmer’s life like?

How to handle personal stress caused by utterly incompetent and ...

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions