I’m dealing with some very non-uniform data and I’m using Ruby regular expressions to parse it. I have to parse times out of strings, and in the data that I’m dealing with times are listed as 9-10:30AM, 9:30AM-9:00PM, 9-10AM. The minutes are not necessarily listed and the AM/PM is not listed if it is the same for both times.
I’m trying to create ruby time objects based on these times using regular expressions, but I’m having trouble developing an expression that will catch any of these times.
This is what I’ve tried:
results = rule.scan(/no parking(sanitation broom symbol)(\d+:\d+((a|p)m))-(\d+:\d+((a|p)m)/)
With this I’ve not been able to return any results, even though the results were listed as I wrote them. The \d+ is used to match the integers in the number, and I think there is possibly something wrong with the way I have used parentheses to match expressions.
You’ll likely want to parse it in multiple parts then use a natural language parser like https://github.com/mojombo/chronic to handle the date/time.