I want to strip a SOAP envelope from a message to get at the XML in the body.
I attempted the following;
String strippedOfEnvelopedHeader = msg.replaceAll("(?s)(?i)<(.*):Envelope.*<\1:Body>", "");
I thought that this would stip out the SOAP envelope, specifically the header, from a message like;
<soapenv:Envelope xmlns:soapenv='http://schemas.xmlsoap.org/soap/envelope/'>
<env:Header xmlns:env='http://schemas.xmlsoap.org/soap/envelope/' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'/>
<soapenv:Body>
<myXML> stuff is here</myXML>
</soapenv:Body>
</soapenv:Envelope>
which should result in;
<myXML> stuff is here</myXML>
</soapenv:Body>
</soapenv:Envelope>
However, the group back-reference does not seem to work.
If I replace both the capture group and the back-reference the substitution works fine;
String strippedOfEnvelopeHeader = msg.replaceAll("(?i)(?s)<soapenv:Envelope.*<soapenv:Body>", "");
I think I can guess the problem, the capture group is being greedy and grabbing the entire message and thus failing the match.
But the solution evades me.
Any ideas?
Try 2 backslashes
You need 2 because
\1itself is already a special escape sequence to Java. Therefore it will be decoded into the character U+0001 before feeding to the regex engine. You need to protect it by adding one more backslash.(And the usual “don’t parse XML with Regex” warning follows…)