What’s the best regex to match an RFC 2822 date?
Basically I would like to match Date: Sun, 19 Feb 2012 16:25:02 +0000 that appears in some emails I receive, but ideally be language independent.
I did find the below regex online, but not sure how to make month language independent (yet still match the rest) – i believe that month should be 3 characters in the spec, but not totally sure…
/^(?:(Sun|Mon|Tue|Wed|Thu|Fri|Sat),\s+)?(0[1-9]|[1-2]?[0-9]|3[01])\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+(19[0-9]{2}|[2-9][0-9]{3})\s+(2[0-3]|[0-1][0-9]):([0-5][0-9])(?::(60|[0-5][0-9]))?\s+([-\+][0-9]{2}[0-5][0-9]|(?:UT|GMT|(?:E|C|M|P)(?:ST|DT)|[A-IK-Z]))(\s+|\(([^\(\)]+|\\\(|\\\))*\))*$/
As @tripleee pointed out, a RFC2822 date will always be English. But if you are parsing dates from a source which is not strictly following RFC2822, and which might use a different language, you will have to identify the set of languages which might be used, and make a single regex which will match any month/day of week name from any of those languages. Afterwards you can use a hash to convert the captured month/day of week names to the internal representation you want to use.