Related to this question: "Fix" String encoding in Java
My project encoding is UTF-8.
I need to make a query to a DB that uses a particular varchar encoding (apparently EUC-KR).
I take the input as UTF-8, and I want to make the DB query with the EUC-KR encoded version of that string.
First of all, I can select and display the encoded strings using the following:
ResultSet rs = stmt.executeQuery("SELECT name FROM mytable");
while(rs.next())
System.out.println(new String(rs.getBytes(1), "EUC-KR"));
I want to do something like:
PreparedStatement ps = conn.prepareStatement("SELECT * FROM MYTABLE WHERE NAME=?");
ps.setString(1,input);
ResultSet rs = ps.executeQuery();
Which obviously won’t work, because my Java program is not using the same encoding as the DB. So, I’ve tried replacing the middle line with each of the following, to no avail:
ps.setString(1,new String(input.getBytes("EUC-KR")));
ps.setString(1,new String(input.getBytes("EUC-KR"), "EUC-KR"));
ps.setString(1,new String(input.getBytes("UTF-8"), "EUC-KR"));
ps.setString(1,new String(input.getBytes("EUC-KR"), "UTF-8"));
I am using Oracle 10g 10.1.0
More details of my attempts follow:
What does seem to work is saving the name from the first query into a string without any other manipulation, and passing that back as a parameter. It matches itself.
That is,
ResultSet rs = stmt.executeQuery("SELECT name FROM mytable");
rs.next();
String myString = rs.getString(1);
PreparedStatement ps = conn.prepareStatement("SELECT * FROM mytable WHERE name=?");
ps.setString(1, myString);
rs = ps.executeQuery();
… will result with the 1 correct entry in rs. Great, so now I just need to convert my input to whatever format that thing seems to be in.
However, nothing I have tried seems to match the “correct” string when I try reading their bytes using
byte[] mybytearray = myString.getBytes();
for(byte b : mybytearray)
System.out.print(b+" ");
In other words, I can turn °í»ê into 고산 but I can’t seem to turn 고산 into °í»ê.
The byte array given by
rs.getBytes(1)
is different from the byte array given by any of the following:
rs.getString(1).getBytes()
rs.getString(1).getBytes("UTF8")
rs.getString(1).getBytes("EUC-KR")
Unhappiness: it turns out that for my DB, NLS_CHARACTERSET = US7ASCII
Which means that what I’m trying to do is unsupported. Thanks for playing everyone 🙁
You can’t accomplish anything with a
Stringconstructor.Stringis always UTF-16 inside. Converting UTF-16 chars to EUC-KR and back again won’t help you.Putting invalid Unicode into
Stringvalues in the hopes that they will then be converted to EUC-KR is a really bad idea.What you are doing is supposed to ‘just work’. The oracle driver is supposed to talk to the server, find out the desired charset, and go from there.
What, however, is the database charset? If someone is storing EUC-KR without having set the charset to EUC-KR, you are more or less up a creek.
What you need to do is to tell your jdbc driver what charset to use to communicate with the server. You haven’t mentioned if you are using Thin or OCI, the answer might be different.
Judging from http://download.oracle.com/docs/cd/E14072_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html, you might want to try turning on defaultNChar.
In general, it’s the job of the jdbc driver to transcode
Stringto what the Oracle server wants. You may need tnsnames.ora options if you are using ‘OCI’.edit
OP reports that the nls_charset of the database is US7ASCII. That means that all JDBC drivers will think that it is their job to convert Unicode
Stringvalues to ASCII. Korean characters will be reduced to ? at best. Officially, then, your are up a creek.There are some possible tricks to try. One is the very dangerous trick of
that will try to make a string of Unicode chars that just so happens to have the values of EUC-KR in their low bytes. My belief is that this will corrupt data, but you could experiment.
Or, perhaps,
ps.setBytes(n, string.getBytes("EUC-KR")), but I myself do not know if Oracle defines the conversion of bytes to chars as a binary copy. It might. Or, perhaps, adding a stored proc that takes a blob as an argument.Really, what’s called for here is to repair the database to use an nls_charset of UTF-8 or EUC-KR, but that’s a whole other job.