Something like: find . -name '*.jar' | while read jarfile;…

Question

0

Asked: May 10, 20262026-05-10T16:11:30+00:00 2026-05-10T16:11:30+00:00

I have the following string and I would like to remove <bpt ></bpt> and

0

I have the following string and I would like to remove <bpt *>*</bpt> and <ept *>*</ept> (notice the additional tag content inside them that also needs to be removed) without using a XML parser (overhead too large for tiny strings).

The big <bpt i='1' x='1' type='bold'><b></bpt>black<ept i='1'></b></ept> <bpt i='2' x='2' type='ulined'><u></bpt>cat<ept i='2'></u></ept> sleeps.

Any regex in VB.NET or C# will do.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-10T16:11:30+00:00

If you just want to remove all the tags from the string, use this (C#):

try {     yourstring = Regex.Replace(yourstring, '(<[be]pt[^>]+>.+?</[be]pt>)', ''); } catch (ArgumentException ex) {     // Syntax error in the regular expression }

EDIT:

I decided to add on to my solution with a better option. The previous option would not work if there were embedded tags. This new solution should strip all <**pt*> tags, embedded or not. In addition, this solution uses a back reference to the original [be] match so that the exact matching end tag is found. This solution also creates a reusable Regex object for improved performance so that each iteration does not have to recompile the Regex:

bool FoundMatch = false;  try {     Regex regex = new Regex(@'<([be])pt[^>]+>.+?</\1pt>');     while(regex.IsMatch(yourstring) ) {         yourstring = regex.Replace(yourstring, '');     } } catch (ArgumentException ex) {     // Syntax error in the regular expression }

ADDITIONAL NOTES:

In the comments a user expressed worry that the ‘.’ pattern matcher would be cpu intensive. While this is true in the case of a standalone greedy ‘.’, the use of the non-greedy character ‘?’ causes the regex engine to only look ahead until it finds the first match of the next character in the pattern versus a greedy ‘.’ which requires the engine to look ahead all the way to the end of the string. I use RegexBuddy as a regex development tool, and it includes a debugger which lets you see the relative performance of different regex patterns. It also auto comments your regexes if desired, so I decided to include those comments here to explain the regex used above:

    // <([be])pt[^>]+>.+?</\1pt> //  // Match the character '<' literally «<» // Match the regular expression below and capture its match into backreference number 1 «([be])» //    Match a single character present in the list 'be' «[be]» // Match the characters 'pt' literally «pt» // Match any character that is not a '>' «[^>]+» //    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» // Match the character '>' literally «>» // Match any single character that is not a line break character «.+?» //    Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?» // Match the characters '</' literally «</» // Match the same text as most recently matched by backreference number 1 «\1» // Match the characters 'pt>' literally «pt>»

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following string and I would like to remove <bpt *>*</bpt> and

Leave an answerCancel reply