I’m trying to write a regexp which will help to find non-translated texts in html code.
Translated texts means that they are going through special tag: or through construction: ${…}
Ex. non-translated:
<h1>Hello</h1>
Translated texts are:
<h1><fmt:message key="hello" /></h1>
<button>${expression}</button>
I’ve written the following expression:
\<(\w+[^>])(?:.*)\>([^\s]+?)\</\1\>
It finds correct strings like:
<p>text<p>
Correctly skips
<a><fmt:message key="common.delete" /></a>
But also catches:
<li><p><fmt:message key="common.delete" /></p></li>
And I can’t figure out how to add exception for ${…} strings in this expression
Can anybody help me?
If I understand you properly, you want to ensure the data inside the “tag” doesn’t contain
fmt:messsageor${....}You might be able to use a negative-lookahead in conjuction with a
.to assert that the characters captured by the.are not one of those cases:If you want to avoid capturing any “tags” inside the tag, you can ignore the
<fmt:messageportion, and just use[^<]instead of a.– to match only non<Added from comment If you also want to exclude “empty” tags, add another negative-lookahead – this time
(?!\s*<)– ensure that the stuff inside the tag is not empty or only containing whitespace: