I was trying to answer a regex question for someone and I came across something that made me scratch my head. Giving the following code…
public static void main(String[] args) throws IOException {
String test = "Hello, how are you today?";
Pattern p = Pattern.compile("(\\W)+");
String[] words = p.split(test);
System.out.println("--" + words[0] + "--");
System.out.println("--" + words[1] + "--");
}
I get the expected results of
--Hello--
--how--
However when I use …
public static void main(String[] args) throws IOException {
String test = "Hello, how are you today?";
Pattern p = Pattern.compile("(\\W)*");
String[] words = p.split(test);
System.out.println("--" + words[0] + "--");
System.out.println("--" + words[1] + "--");
}
I get the results
----
--H--
Is there a reason * doesn’t work exactly like the + in this situation?
*matches zero or more. As a result, everything becomes a delimiter (zero width delimiters)Edit
By the way, that doesn’t mean it’s acting non-greedily. If you look at the characters returned you get this:
Notice how there are not two empty elements between “o” and “h”; just one. Below, each delimiter is surrounded by
{}.