How do you match more than one space character in Java regex?
I have a regex I am trying to match. The regex fails when I have two or more space characters.
public static void main(String[] args) {
String pattern = "\\b(fruit)\\s+([^a]+\\w+)\\b"; //Match 'fruit' not followed by a word that begins with 'a'
String str = "fruit apple"; //One space character will not be matched
String str_fail = "fruit apple"; //Two space characters will be matched
System.out.println(preg_match(pattern,str)); //False (Thats what I want)
System.out.println(preg_match(pattern,str_fail)); //True (Regex fail)
}
public static boolean preg_match(String pattern,String subject) {
Pattern regex = Pattern.compile(pattern);
Matcher regexMatcher = regex.matcher(subject);
return regexMatcher.find();
}
The problem is actually because of backtracking. Your regex:
Says “fruit, followed by one or more spaces, followed by one or more non ‘a’ characters, followed by one or more ‘word’ characters”. The reason this fails with two spaces is because
\s+matches the first space, but then gives back the second, which then satisfies the[^a]+(with the second space) and the\s+portion (with the first).I think you can fix it by simply using the posessive quantifier instead, which would be
\s++. This tells the\snot to give back the second space character. You can find the documentation on Java’s quantifiers here.As an illustration, here are two examples at Rubular:
\s(gives expected results, from what you describe)[^a\]+and\w+. Notice that the second match group (representing the[^a]+) is capturing a the second space character.