I am writing a program to validate and correct a given date as a string. Lets take 04121987 as the date in the format ddmmyyyy. A regular expression for such a date:
(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])(19\d\d|20\d\d)
If I match my string with the regular expression it works well. In Python:
>>> regex = re.compile(r'(0[1-9]|[12][0-9]|3[01])(0[1-9]|1[012])(19\d\d|20\d\d)')
>>> regex.findall('04121987')
[('04', '12', '1987')]
If I have a string 04721987 one can clearly see 72 is not a valid month and thus the string will not match the regex.
>>> regex.findall('04721987')
[]
What I would like to find out is the character which causes the regex to fail and its position. In this case it is 7. How could I do this in Python?
I believe what you want isn’t possible, because
_sremodule is implemented in C ;(.You could try using this package instead (by monkey patching
sre_compile, modifying the path and importing your new_srefirst, etc.) but I don’t think it worths it. It is an implementation of the_srepackage fully written in Python, so you’ll be able to see the source code, edit it, and do something right when the next character doesn’t match.You could do a similar thing by either:
Perhaps you don’t obtain the exact digit where the error is, but I don’t think it makes too much sense in this scenario, as long as you tell the user what is wrong (day, month or year).