is there any other way to know if the java String contains character-encoding in UTF-8 encoding or not ,like the Arabic words for example .
i tried this code : but does it accurate and make the job ?
char c = 'أ';
int num = (int) c;
if(num> 128)
// then UTF-8 characters exists
(Assuming UTF-8 == non-ASCII)
What you could do is encode then decode the string in ASCII and compare the result of that with the original. If they’re not equal, there are non-ASCII characters.
However, your own sample would work too (almost, should be
>= 128), because the following proves that indeed allchars< 128are ASCII:(“UTF-16” and “ASCII”, Wikipedia)
And
chars are UTF-16 “code units”.However, judging from the question in its entirety, you might be better off reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) first.