I have date in the text format like:
6 weeks ago, 2012 April 18 15:08:18
13 weeks ago, 2012 March 01 17:33:52
The main problem is that this texts are really in Russian, so instead of weeks ago there is the same text in Russian. And the same is with months (looks like I should create some dictionary of possible values).
I don’t know how to start. Should I use regular expressions? Something else?
Not Russian, but Polish:
Firefox has no problems in extracting Unicode characters (quick & dirty regular expression):
Parsing:
The
resultis:I don’t know Russian, but you might need to do some extra linguistic work. E.g. in Polish I have “1 tydzień” but “2 tygodnie” and even “5 tygodni” (mind the different form).