Input :
<tag>Testing different formatting options in </tag><tag class="classA classB">Text</tag><tag class="classC">Class C text</tag>
Expected Output :
<tag>Testing different formatting options in </tag><tagA><tabB>Text</tagA></tagB><tagC>Class C text</tag>
Basically the tag is replaced by tags based on the attributes in “class”. ie., if the attributes has a classA attribute then the tag will be replaced by tagA, if classB attribute is also present then the tag will also include tagB and so on..
Attempt made :
final String TAG_GROUPS = "<tag class=\"(.*)\">(.*)</tag>";
Pattern pattern = Pattern.compile(TAG_GROUPS);
Matcher matcher = pattern.matcher(inputString);
The output I am getting fails to find the matching tags. In particular the statement
String classes = matcher.group(1);
gives the string classA classB">Text</tag><tag class="classC">Class C text</tag. The pattern matcher is failing to find the matching tags. I am a beginner to regular expressions and I would like to know the right pattern for the problem. Any help is appreciated.
When you use
*it will try to absorb all possible characters (greedy).If you want that
.*to match the less possible characters you must use lazy match with*?.So your regex get as:
Above, is the easy way. But isn’t necessary the optimum way. Lazy match is more slow than greedy and if you can, you must try to avoid it. For example if you estimate that you code will be correct (not tag broken without a close tag, etc) is better that you use negate classes instead of
.*?. For example, you regex can be write as:Witch is more efficient for the regex engine (although is not always possible to convert lazy match to negate class).
And of course, if you are trying to parse a complete html or xml document in witch you must do many different changes, it’s better to use a xml (html) parser.