How would you translate this Perl regex into Java?
/pattern/i
While compiles, it does not match “PattErn” for me, it fails
Pattern p = Pattern.compile("/pattern/i");
Matcher m = p.matcher("PattErn");
System.out.println(m.matches()); // prints "false"
You can’t.
There are a lot of reasons for this. Here are a few:
Java doesn’t support as expressive a regex language as Perl does. It lacks grapheme support (like
\X)and full property support (like\p{Sentence_Break=SContinue}), is missing Unicode named characters, doesn’t have a(?|...|...|)branch reset operator, doesn’t have named capture groups or a logical\x{...}escape before Java 7, has no recursive regexes, etc etc etc. I could write a book on what Java is missing here: Get used to going back to a very primitive and awkward to use regex engine compared with what you’re used to.Another even worse problem is because you have lookalike faux amis like
\wand and\band\s, and even\p{alpha}and\p{lower}, which behave differently in Java compared with Perl; in some cases the Java versions are completely unusable and buggy. That’s because Perl follows UTS#18 but before Java 7, Java did not. You must add theUNICODE_CHARACTER_CLASSESflag from Java 7 to get these to stop being broken. If you can’t use Java 7, give up now, because Java had many many many other Unicode bugs before Java 7 and it just isn’t worth the pain of dealing with them.Java handles linebreaks via
^and$and., but Perl expects Unicode linebreaks to be\R. You should look atUNIX_LINESto understand what is going on there.Java does not by default apply any Unicode casefolding whatsoever. Make sure to add the
UNICODE_CASEflag to your compilation. Otherwise you won’t get things like the various Greek sigmas all matching one another.Finally, it is different because at best Java only does simple casefolding, while Perl always does full casefolding. That means that you won’t get
\xDFto match "SS" case insensitively in Java, and similar related issues.In summary, the closest you can get is to compile with the flags
which is equivalent to an embedded
"(?iuU)"in the pattern string.And remember that match in Java doesn’t mean match, perversely enough.
EDIT
And here’s the rest of the story…
You shouldn’t have slashes around the pattern.
The best you can do is to translate
this way
There, see how much easier that isn’t? 🙂