I have a Java application that uses a C++ DLL via JNI. A few of the DLL’s methods take string arguments, and some of them return objects that contain strings as well.
Currently the DLL does not support Unicode, so the string handling is rather easy:
- Java calls String.getBytes() and passes the resulting array to the DLL, which simply treats the data as a char*.
- DLL uses NewStringUTF() to create a jstring from a const char*.
I’m now in the process of modifying the DLL to support Unicode, switching to using the TCHAR type (which when UNICODE is defined uses windows’ WCHAR datatype). Modifying the DLL is going well, but I’m not sure how to modify the JNI portion of the code.
The only thing I can think of right now is this:
- Java calls String.getBytes(String charsetName) and passes the resulting array to the DLL, which treats the data as a wchar_t*.
- DLL no longer creates Strings, but instead passes jbyteArrays with the raw string data. Java uses the String(byte[] bytes, String charsetName) constructor to actually create the String.
The only problem with this method is that I’m not sure what charset name to use. WCHARs are 2-bytes long, so I’m pretty sure it’s UTF-16, but there are 3 posibilities on the java side. UTF-16, UTF-16BE, and UTF-16LE. I haven’t found any documentation that tells me what the byte order is, but I can probably figure it out from some quick testing.
Is there a better way? If possible I’d like to continue constructing the jstring objects within the DLL, as that way I won’t have to modify any of the usages of those methods. However, the NewString JNI method doesn’t take a charset identifier.
This answer suggests that the byte-ordering of WCHARS is not guaranteed…
Since you are on Windows you could try
WideCharToMultiByteto convert the WCHARs to UTF-8 and then use your existing JNI code.You will need to be careful using WideCharToMultiByte due to the possibility of buffer overruns in the
lpMultiByteStrparameter. To get round this you should call the function twice, first withlpMultiByteStrset toNULLandcbMultiByteset to zero – this will return the length of the requiredlpMultiByteStrbuffer without attempting to write to it. Once you have the length you can allocate a buffer of the required size and call the function again.Example code: