Consider the following regex:
(([^\|])*\|)*([^\|]*)
This matches repetitive string patterns of the type
("whatever except |" |) {0 to any times} ("whatever except |" |) {1 time}
So it should match the following String, which has 17 substrings (16 repeated, plus ” z” as the last one).
"abcd | e | fg | hijk | lmnop | | | qrs | t| uv| w |||||x y| z"
Indeed, RegexPal verifies that the given regex does match the above string.
Now, I want to get each of the substrings (i.e., “abcd |”, “e |”, “fg |”, etc.), for which there is no prior knowledge about their number, length etc.
According to a similarly-titled previous StackOverflow post and the documentation of the Matcher class find() method, I just need to do something like
Pattern pattern = Pattern.compile(regex); // regex is the above regex
Matcher matcher = pattern.matcher(input); // input is the above string
while (matcher.find())
{
System.out.println(matcher.group(1));
}
However, when I do this I just get 2 strings printed out: the last repeated substring (“x y|”) and a null value; definitely not the 16 substrings I expect.
A nice thing would also be to check that a match has actually happened, before running the find() loop, but I am not sure whether matches(), groupCount() > 0, or some other condition should be used, without doing twice the matching work, given that find() also does matching.
So, questions:
- How can I get all the 16 repeated substrings?
- How can I get the last substring?
- How do I check that the string matched?
If you must use the regular expression…
See below. When cycling over for matches, you don’t need everything to match, just the section you want. (I get 17 matches–is this correct?)
Switching the delim to the start of the regex and also allowing ‘^’.
What qualifies for a non-match? Any string will match.
Here is a solution using regular expressions:
Output