I’m scrubbing through a large number of XML based files in a JSF project, and would like to find certain components that are missing an ID attribute. For example, let’s say I want to find all of the <h:inputText /> elements that do not have an id-attribute specified.
I’ve tried the following in RAD (Eclipse), but something’s not quite right because I still get some components that do have a valid ID.
<([hf]|ig):(?!output)\w+\s+(?!\bid\b)[^>]*?\s+(?!\bid\b)[^>]*?>
Not sure if my negative-lookahead is correct or not?
The desired result would be that I would find the following (or similar) in any JSP in the project:
<h:inputText value="test" />
… but not:
<h:inputText id="good_id" value="test" />
I’m just using <h:inputText/> as an example. I was trying to be broader than that, but definitely excluding <h:outputText/>.
Disclaimer:
As others correctly point out, it is best to use a dedicated parser when working with non-regular markup languages such as XML/HTML. There are many ways for a regex solution to fail with either false positives or missed matches.
That said…
This particular problem is a one-shot editing problem and the target text (an open tag) is not a nested structure. Although there are ways for the following regex solution to fail, it should still do a pretty good job.
I don’t know Eclipse’s regex syntax, but if it provides negative lookahead, the following is a regex solution that will match a list of specific target elements which do not have an ID attribute: (First, presented in PHP/PCRE free-spacing mode commented syntax for readability)
And here is the same regex in bare-bones native format which may be suitable for copy and paste into an Eclipse search box:
<(?:h:inputText|h:otherTag|h:anotherTag)(?:\s+(?!id\b)[\w\-.:]+(?:\s*=\s*(?:"[^"]*"|'[^']*'|[\w\-.:]+))?)*\s*/?>Note the group of target element names to be matched at the beginning of this expression. You can add or subtract desired target elements to this ORed list. Note also that this expression is designed to work pretty well for HTML as well as XML (which may have value-less attributes, unquoted attribute values and quoted attribute values containing
<>angle brackets).