What could be the regular expression to detect a multi byte string.
For example here is the expression to detect a string in english
Pattern p=Pattern.compile("[a-zA-Z/]");
Similarly I want a pattern which has multi bytes like
コメント_1050_固-減価償却費
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You may want to have a look at Unicode Support in Java
I think basically you want the Unicode property
\p{L}. This would match any code point that has the property “letter”.So your regex could look like this
I just replaced the character ranges
a-zA-Zwith\p{L}Since Java 7 you could also use
Pattern.UNICODE_CHARACTER_CLASSThat would turn the predefined
\winto the Unicode version, means it would match all Unicode letters and digits (and string connecting characters like _)So to match your string
コメント_1050_固-減価償却費, you could useThis would match any string consisting of letters, digits and _
See here for more details
and here on regular-expression.info an overview over the Unicode scripts, properties and blocks.
See here a famous answer from tchrist about the caveats of regex in Java, including an updated what has changed with Java 7 (or will be in Java 8)