Input : <tag>Testing different formatting options in </tag><tag class=classA classB>Text</tag><tag class=classC>Class C text</tag> Expected

Question

0

Asked: June 17, 20262026-06-17T09:15:07+00:00 2026-06-17T09:15:07+00:00

Input : <tag>Testing different formatting options in </tag><tag class=classA classB>Text</tag><tag class=classC>Class C text</tag> Expected

0

Input :

<tag>Testing different formatting options in </tag><tag class="classA classB">Text</tag><tag class="classC">Class C text</tag>

Expected Output :

<tag>Testing different formatting options in </tag><tagA><tabB>Text</tagA></tagB><tagC>Class C text</tag>

Basically the tag is replaced by tags based on the attributes in “class”. ie., if the attributes has a classA attribute then the tag will be replaced by tagA, if classB attribute is also present then the tag will also include tagB and so on..

Attempt made :

    final String TAG_GROUPS = "<tag class=\"(.*)\">(.*)</tag>";
    Pattern pattern = Pattern.compile(TAG_GROUPS);
    Matcher matcher = pattern.matcher(inputString);

The output I am getting fails to find the matching tags. In particular the statement

    String classes = matcher.group(1);

gives the string classA classB">Text</tag><tag class="classC">Class C text</tag. The pattern matcher is failing to find the matching tags. I am a beginner to regular expressions and I would like to know the right pattern for the problem. Any help is appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T09:15:08+00:00

When you use * it will try to absorb all possible characters (greedy).

If you want that .* to match the less possible characters you must use lazy match with *?.

So your regex get as:

<tag class=\"(.*?)\">(.*?)</tag>

Above, is the easy way. But isn’t necessary the optimum way. Lazy match is more slow than greedy and if you can, you must try to avoid it. For example if you estimate that you code will be correct (not tag broken without a close tag, etc) is better that you use negate classes instead of .*?. For example, you regex can be write as:

<tag class="([^"]*)">([^<]*)</tag>

Witch is more efficient for the regex engine (although is not always possible to convert lazy match to negate class).

And of course, if you are trying to parse a complete html or xml document in witch you must do many different changes, it’s better to use a xml (html) parser.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Input : <tag>Testing different formatting options in </tag><tag class=classA classB>Text</tag><tag class=classC>Class C text</tag> Expected

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply