I m trying to match unicode characters in Java.
Input String: informa
String to match : informátion
So far I ve tried this:
Pattern p= Pattern.compile("informa[\u0000-\uffff].*", (Pattern.UNICODE_CASE|Pattern.CANON_EQ|Pattern.CASE_INSENSITIVE));
String s = "informátion";
Matcher m = p.matcher(s);
if(m.matches()){
System.out.println("Match!");
}else{
System.out.println("No match");
}
It comes out as “No match”. Any ideas?
The term “Unicode characters” is not specific enough. It would match every character which is in the Unicode range, thus also “normal” characters. This term is however very often used when one actually means “characters which are not in the printable ASCII range“.
In regex terms that would be
[^\x20-\x7E].Depending on what you’d like to do with this information, here are some useful follow-up answers: