Scenario
I need to write a validation function that validates XML tag names (or attribute names) .
Eg.:
"div"is valid"d<iv"is not valid"d\iv"is not valid
If a string is not valid i should escape that makes it invalid, and replace them with some arbitrary character (or remove it) .
Eg.:
"d<iv"is not valid -> I replace it with"div".
Those functions will be heavily called – so I need to take in consideration code effectiveness.
My problem(s)
- What are the rules that describe a valid XML tag/attribute name ? Is it safe to consider a valid XML tag/attribute to be described by the same rules as java variable name ? Or are those rules too restrictive ?
- Should I use the java regex package or I should write my own specialized method ? (As I said speed is important) .
- Do you have any suggestions ?
Thank you!
The rules are defined in the xml spec (look at the name definition)
If speed matters, then don’t use regular expressions. Do it more like this:
Note – the code above is a simple guideline, it does not cover the little annoyance, that the first char of an xml name has a different value range … if you want to correct illegal tags like
$%&divthen it’s a bit more complicated (more magic needed)