I’m trying to use the range pattern [01-12] in regex to match two digit mm, but this doesn’t work as expected.
I’m trying to use the range pattern [01-12] in regex to match two digit
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You seem to have misunderstood how character classes definition works in regex.
To match any of the strings
01,02,03,04,05,06,07,08,09,10,11, or12, something like this works:References
Explanation
A character class, by itself, attempts to match one and exactly one character from the input string.
[01-12]actually defines[012], a character class that matches one character from the input against any of the 3 characters0,1, or2.The
-range definition goes from1to1, which includes just1. On the other hand, something like[1-9]includes1,2,3,4,5,6,7,8,9.Beginners often make the mistakes of defining things like
[this|that]. This doesn’t “work”. This character definition defines[this|a], i.e. it matches one character from the input against any of 6 characters int,h,i,s,|ora. More than likely(this|that)is what is intended.References
How ranges are defined
So it’s obvious now that a pattern like
between [24-48] hoursdoesn’t “work”. The character class in this case is equivalent to[248].That is,
-in a character class definition doesn’t define numeric range in the pattern. Regex engines doesn’t really “understand” numbers in the pattern, with the exception of finite repetition syntax (e.g.a{3,5}matches between 3 and 5a).Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character
0is encoded in ASCII as decimal 48;9is 57. Thus, the character definition[0-9]includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters0,1, …,9.See also
Another example: A to Z
Let’s take a look at another common character class definition
[a-zA-Z]In ASCII:
A= 65,Z= 90a= 97,z= 122This means that:
[a-zA-Z]and[A-Za-z]are equivalent[a-Z]is likely to be an illegal character rangea(97) is “greater than” thanZ(90)[A-z]is legal, but also includes these six characters:[(91),\(92),](93),^(94),_(95),`(96)Related questions