I want to use BCP to load into a SQL Server 2005 table with an nvarchar field using a loader control file. As I understand it, SQL Server 2005 only supports UTF-16 (and I believe it is UTF-16 LE). The file is being output by a Java program. The way I have it currently set up is as follows:
-
An XML format BCP loader file (created using the following command:
bcp test_table format nul -c -x -T -f test_table.xml -S server) -
A Java program using the following code to write the output:
File f = new File("from_java.txt"); String encoding = "x-UTF-16LE-BOM"; OutputStream os = new FileOutputStream(f); OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os, encoding); String theString = "áááááLittle Endian, BOM\r\n"; outputStreamWriter.append(theString); outputStreamWriter.flush(); outputStreamWriter.close(); -
Then using the following bcp command:
bcp test_table in from_java.txt -T -f test_table.xml -S server -error error.txt
What I get in the table is ÿþá. and not áááááLittle Endian, BOM
I’ve tried a few different permutations of changing parameters:
- changing the way I generate the loader control file (using -n for native data instead of -c for character data…I think this may have something to do with it, but I didn’t see any improvement in my inserted data)
- tried several different forms of the UTF-16 encoding, including big endian and little endian with no BOM, to no avail
- tried to output the BOM manually in the file as I read somewhere that Microsoft really like to make use of BOM information
- looked into trying to output the file as UCS-2 (instead of UTF-16) as that is (apparently) what BCP is actually reading the file in as
- tried -w on the bcp import, this does work, but not in conjunction with a loader format file (is there a way to incorporate whatever magic tells BCP that the file is encoded in UTF-16 into the format file?)
- I can get it to work if I output the file in windows-1252 and specify that codepage as a
-c 1252option to bcp when I load the file (but I don’t want to do this as I will be losing information as UTF-16 is a superset of what can be represented compared to 1252)
Has anyone managed to get bcp to load into an nvarchar field using UTF-16 data in conjunction with a loader format configuration file?
Thanks in advance,
-James
I have been literally underwhelmed with responses, but I have cracked it.
The loader file needs to be generated with a
-wflag, so the command to generate the file is:bcp <table> format nul -w -x T -f loader-control-w-format.xml -S <server> -t "||"This leads to a loader control file that looks a little different, you get entries such as:
<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="|\0|\0" MAX_LENGTH="1000" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
Note the delimiter is listed as
|\0|\0, the zeroes correspond to the extra byte in the file as UTF-16 (or just “unicode” as Microsoft (wrongly) call it) is a double byte character encoding.Some notes for the sanity of anyone else dealing with BCP in this way:
The Java code for writing out a file in UTF-16 that can be loaded in this way is as follows: