I want to parse data which might contain mixed patterns like
1-4pm
1pm-5pm
noon to 11pm
noon to midnight
etc.
I want to extract start and end time. How can I achieve this through regex. I know I can’t support all possible input formats, but how can I achieve to support a maximum?
this is my expression
^((?<dayPart>[a-z]+)?)\s*(?<startTime>[0-9]{1,2}[:]?[0-9]{0,2}\s*[am|pm|a.m|p.m]*[.]*)?\s*[-|to|\\|/|=]*\s*((?<endPart>[a-z]+)?|(?<endTime>[0-9]{1,2}[:]?[0-9]{0,2}\s*[am|pm|a.m|p.m]*[.]*))?$
which covers almost all combination.
I just want to know if there is any optimization in this regex.
Here dayPart will consume all starting non-digit characters to handle if time-span starts with noon, midnight etc or any value which we can ignore like Sunday.
startTime will try to consume any time in any format if it is there. same is for endPart and endTime.
First, define a pattern that matches a single point in time. Given your examples it might be something like:
Next, define the separator. Perhaps:
Finally, combine two of the first with one of the second. Assuming your language supports variables, something like:
Once you pass that through the engine you should be able to get at the parts of the expression that matched. You might need to make some groups non-capturing but I’ll leave that as an exercise for the reader. You’ll then likely have to parse the individual parts to figure out the time. For example, parse “1pm” as a 1 and “pm” and calculate a time based on that.
Once you have it broken down like that it makes it easier to fiddle with the constituent parts and makes the expression a bit more comprehensible. Though, the same thing can be accomplished in some languages that support multiline expressions with comments.