I need some help figuring out the regex for XML character references to control characters, in decimal or hex.
These sequences look like the following:
�




In other words, they are an ampersand, followed by a pound, followed by an optional ‘x’ to denote hexadecimal mode, followed by 1 to 4 decimal (or hexadecimal) digits, followed by a semicolon.
I’m specifically trying to identify those sequences where they contain (inclusive) numbers from decimal 0 to 31, or hexadecimal 0 to 1F.
Can anyone figure out the regex for this??
If you use a zero-width lookahead assertion to restrict the number of digits, you can write the rest of the pattern without worrying about the length restriction. Try this:
Explanation:
This pattern allows leading zeroes after the
x, but the(?=x?[0-9A-Fa-f]{1,4})part prevents them from occurring before anx.