I am trying to create a regex in Java to match the pattern of a particular word to find other words with the same pattern. For example, the word “tooth” has the pattern 12213 since both the ‘t’ and ‘o’ repeat. I would want the regex to match other words like “teeth”.
So here’s my attempt using backreferences. In this particular example, it should fail if the second letter is the same as the first letter. Also, the last letter should be different from all the rest.
String regex = "([a-z])([a-z&&[^\1]])\\2\\1([a-z&&[^\1\2]])";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("tooth");
//This works as expected
assertTrue(m.matches());
m.reset("tooto");
//This should return false, but instead returns true
assertFalse(m.matches());
I have verified that it works on examples like “toot” if I remove the last group, i.e. the following, so I know the backreferences are working up to this point:
String regex = ([a-z])([a-z&&[^\1]])\\2\\1";
But if I add back the last group to the end of the pattern, it’s like it doesn’t recognize the backreferences inside the square brackets anymore.
Am I doing something wrong, or is this a bug?
If you print your regex you get a clue what is wrong, the backreferences in your groups are actually escaped by Java to produce some weird characters. Therefore it doesn’t work as expected. For example:
also prints
Also,
&&doesn’t work in regexes, you will have to use lookahead instead. This expression works for your example above:The expression
(?!\\1)looks ahead to see that the next charachter isn’t the first one in the expression, without moving the regex cursor forward.