I’m trying create a regex that verifies an xml entity name is valid (see related issue: here).
(:|[A-Z]|_|[a-z]|[\xC0-\xD6]|[\xD8-\xF6]|[\xF8-\x2FF]|[\x370-\x37D]|[\x37F-\x1FFF]|[\x200C-\x200D]|[\x2070-\x218F]|[\x2C00-\x2FEF]|[\x3001-\xD7FF]|[\xF900-\xFDCF]|[\xFDF0-\xFFFD]|[\x10000-\xEFFFF])
Basically it’s verifying that the first character is a valid character. However the token [\xF8-\x2FF] is bombing out regex validation. Any idea why? I can’t figure it out.
UPDATE
The .net parser is throwing an exception that says range in reverse order.
You can only use one character per range in a regex and most regex parsers don’t understand multiple bytes using the
\xnotation. Use the\unotation instead.The .NET regex documentation states
And for unicode:
So I’ve used both above,
\xfor the 2-char hex values and\ufor the larger ones.