I am reading the string of data from the oracle database that may or may not contain the Unicode characters into a c++ program.Is there any way for checking the string extracted from the database contains an Unicode characters(UTF-8).if any Unicode characters are present they should be converted into hexadecimal format and need to displayed.
Share
There are two aspects to this question.
Distinguish UTF-8-encoded characters from ordinary ASCII characters.
UTF-8 encodes any code point higher than 127 as a series of two or more bytes. Values at 127 and lower remain untouched. The resultant bytes from the encoding are also higher than 127, so it is sufficient to check a byte’s high bit to see whether it qualifies.
Display the encoded characters in hexadecimal.
C++ has
std::hexto tell streams to format numeric values in hexadecimal. You can usestd::showbaseto make the output look pretty. Acharisn’t treated as numeric, though; streams will just print the character. You’ll have to force the value to another numeric type, such asint. Beware of sign-extension, though.Here’s some code to demonstrate:
You could call it like this:
Output on Solaris 10 with Sun C++ 5.8:
The code detects UTF-8-encoded characters, but it makes no effort to decode them; you didn’t mention needing to do that.
I used
*pc & 0xffto convert the expression to an integral type and to mask out the sign-extended bits. Without that, the output on my computer was0xffffffbb, for instance.