I have this Regular Expression that matches the following strings:
<!-- 09-02-2009 --->
<!-- 09-02-2009 12:00:00 --->
<!-- 09-02-2009 12:00:00 A --->
<!-- 09-02-2009 12:00:00 AM --->
Here is the pattern:
<!-- (?<month>\d{2}?)-(?<day>\d{2}?)-(?<year>\d{4}?)(?:(?: ?\d{2}:?){3}?(?: ?[aApP][mM]?)?)? --->
updated pattern, per twistol:
<!-- (?<month>\d{2}?)-(?<day>\d{2}?)-(?<year>\d{4}?)(?<time>(?: ?(?:\d{2}:){2}\d{2})?(?: ?[aApP][mM]?)?)? --->
Is there anything I can do to simplify this pattern?
Thanks!
EDIT
Here is the pattern I came up with all comments/answers, plus validation built in. It is a bit ugly, but who said regex needs to be pretty? 😛
<!-- (?<month>(?:0[1-9]|1[0-2]))-(?<day>(?:0[1-9]|1[0-9]|2[0-9]|3[01]))-(?<year>\d{4})(?<time> (?:0[0-9]|1[0-9]|2[0-3]):(?:[0-5][0-9])(?::[0-5][0-9])?(?: [aApP][mM]?)?)? --->
It will match valid dates in the following formats:
<!-- 09-02-2009 --->
<!-- 09-02-2009 12:00 --->
<!-- 09-02-2009 12:00 A --->
<!-- 09-02-2009 12:00 AM --->
<!-- 09-02-2009 12:00:00 --->
<!-- 09-02-2009 12:00:00 A --->
<!-- 09-02-2009 12:00:00 AM --->
Is as simple as I can think of. Note that this regex isn’t exactly the same, since in the original the timestamp colons were all optional, meaning it would match 01:0203 or 0102:03:, etc. I think my version may be more correct.
Basically I removed all the noncapturing groups and quantifiers I could, which when they are merely doubling a digit make it less readable, as opposed to more. I also removed the greediness modifier on the quantifiers, since they will always match exactly 2 or 4 or whatever whether it’s greedy or not.
And of course, this will match invalid dates, such as 13-32-0000. To fix that, you will have to decide whether a complex yet correct solution is more desirable than a simple, more understandable one. Basically, it depends on your confidence in the text you will be running this over. If there are likely to be false positives that you want to filter out, go for a more correct solution, even if it is slightly less readable.