I got a method that checks if a string is a valid hex string:
public bool IsHex(string value)
{
if (string.IsNullOrEmpty(value) || value.Length % 2 != 0)
return false;
return
value.Substring(0, 2) == "0x" &&
value.Substring(2)
.All(c => (c >= '0' && c <= '9') ||
(c >= 'a' && c <= 'f') ||
(c >= 'A' && c <= 'F'));
}
The rules are:
The expression must be composed of an even number of hexadecimal digits (0-9, A-F, a-f).
The characters 0x must be the first two characters in the expression.
I’m sure it can be rewriten in regex in a much cleaner and more efficient way.
Could you help me out with that?
After you updated your question, the new regex that works for you should be:
Where I use
(?:for non-capturing grouping for efficiency. The{2}means that you want two of the previous expression (i.e., two hex chars), the+means you want one or more hex characters. Note that this disallows0xas a valid value.Efficiency
“Oded” mentioned something about efficiency. I don’t know your requirements, so I consider this more an exercise for the mind than anything else. A regex will make leaps as long as the smallest matching regex. For instance, trying my own regex on 10,000 variable input strings of size 50-5000 characters, all correct, it runs in 1.1 seconds.
When I try the following regex:
it runs about 40% faster, in 0.67 seconds. But be careful. Knowing your input is knowing how to write efficient regexes. For instance, if the regex fails, it will do a lot of back-tracking. If half of my input strings has the incorrect length, the running time explodes to approx 34 seconds, or 3000% (!), for the same input.
It becomes even trickier if most input strings are large. If 99% of your input is of valid length, all are > 4130 chars and only a few are not, writing
is efficient and boosts time even more. However, if many have incorrect
length % 2 = 0, this is counter-efficient because of back-tracking.Finally, if most your strings satisfy the even-number-rule, and only some or many strings contain a wrong character, the speed goes up: the more input that contains a wrong character, the better the performance. That is, because when it finds an invalid character it can immediately break out.
Conclusion: if your input is mixed small, large, wrong character, wrong count your fastest approach would be to use a combination of checking the length of the string (instantaneous in .NET) and use an efficient regex.