I have the following pattern definition:
private static final String D = "(Mon|Tue|Wed|Thu|Fri|Sat|Sun)";
private static final String DD = "(" + D + "-" + D + ")";
private static final String m_daysPattern = "(" + D + "|" + DD + ")(?:,(" + D + "|" + DD + "))*";
private static final Pattern m_daysRegex = Pattern.compile(m_daysPattern);
This regex should match expressions like these:
- “Mon-Fri”
- “Tue-Fri,Sun”
- “Mon-Wed,Fri-Sun”
I did it pretty easily in .NET using named groups, but here in Java 6 I am a bit lost. Can anyone show me an example of a code, which would allow me to extract the days and understand exactly which ranges and single days are mentioned in the given expression?
Thanks.
EDIT1
OK, here is what I get examining the groups of m_daysRegex.matcher("Mon-Fri"):
matchDays.groupCount() = 10
matchDays.groups = [0,7,0,7,-1,-1,0,7,0,3,4,7,-1,...,-1] (22 values)
matchDays.group(0) = "Mon-Fri"
matchDays.group(1) = "Mon-Fri"
matchDays.group(2) = null
matchDays.group(3) = "Mon-Fri"
matchDays.group(4) = "Mon"
matchDays.group(5) = "Fri"
matchDays.group(6) = null
matchDays.group(7) = null
matchDays.group(8) = null
matchDays.group(9) = null
matchDays.group(10) = null
Can someone explain to me the logic of all this? I mean not only should I get “Mon” and “Fri”, but I also have to know that they are part of a range subexpression “Mon-Fri”, rather than just “Mon,Fri”.
BTW, matching “Mon,Fri” gives us this:
matchDays.groupCount() = 10
matchDays.groups = [0,7,0,3,0,3,-1,-1,-1,-1,-1,-1,4,7,4,7,-1,...,-1] (22 values)
matchDays.group(0) = "Mon,Fri"
matchDays.group(1) = "Mon"
matchDays.group(2) = "Mon"
matchDays.group(3) = null
matchDays.group(4) = null
matchDays.group(5) = null
matchDays.group(6) = "Fri"
matchDays.group(7) = "Fri"
matchDays.group(8) = null
matchDays.group(9) = null
matchDays.group(10) = null
It is different, which is good. Still I find it hard to understand the algorithm.
This is quite easy in .NET, so I was expecting something as easy in Java. Fair, no?
EDIT2
Is there a Java regex guide explaining this stuff? All the guides I have seen so far examine really simple regular expressions.
EDIT3
OK, I begin to get it. My regex can be depicted like this:
( // 1 = D1|D2-D3
(D1)| // 2 = D1
( // 3 = D2-D3
(D2)- // 4 = D2
(D3) // 5 = D3
)
)
(?:,
( // 6 = D4|D5-D6
(D4)| // 7 = D4
( // 8 = D5-D6
(D5)- // 9 = D5
(D6) // 10 = D6
)
)
)*
It explains the group values for “Mon-Fri” and “Mon,Fri”. But how are the repetitions handled? For instance, “Mon,Wed,Fri” ? Still trying to figure it out.
EDIT4
I now get it. I can match a complex regex with repetitions, but I cannot extract the respective matches easily – must iterate using the find() method and a simpler subregex. So, in my case I have decided to:
- Match the entire expression to make sure it is valid
- Split it (with
StringUtils.splitfrom apache) by ‘,’ - Iterate the split parts, splitting by ‘-‘ if necessary.
Thanks to all the good Samaritans out there for all the help.
For a description of capturing groups etc, go here: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
As for your pattern, if you write out all parantheses you get:
where d is
Listing the groups, using M to show which parts that belong to a given group:
Given the string “Mon-Fri” you’ll get the strings you see in the output, i.e.
For multiple matches you’ll do something like
EDIT: come to think of it, you probably don’t need the last part, i.e.
in your pattern. Just use the while loop with the
pattern to find each part (i.e. the stuff between the commas).