Is there any way in SQL Server of determining what a character in a code page would represent without actually creating a test database of that collation?
Example. If I create a test database with collation SQL_Ukrainian_CP1251_CS_AS and then do CHAR(255) it returns я.
If I try the following on a database with SQL_Latin1_General_CP1_CS_AS collation however
SELECT CHAR(255) COLLATE SQL_Ukrainian_CP1251_CS_AS
It returns y
SELECT CHAR(255)
Returns ÿ so it is obviously going first via the database’s default collation then trying to find the closest equivalent to that in the explicit collation. Can this be avoided?
While MS SQL supports both code pages and Unicode unhelpfully it doesn’t provide any functions to convert between the two so figuring out what character is represented by a value in a different code page is a pig.
There are two potential methods I’ve seen to handle conversions, one is detailed here
http://www.codeguru.com/cpp/data/data-misc/values/article.php/c4571
and involves bolting a custom conversion program onto the database and using that for conversions.
The other is to construct a db table consisting of
with the unicode value stored as either the int representing the unicode character to be converted using
nchar()or the nchar itselfYour using the collation
SQL_Ukrainian_CP1251_CS_ASwhich is code page 1251 (CP1251 from the centre of the string). You can grab its translation table here http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1251.TXTIts a TSV so after trimming the top off the raw data should import fairly cleanly.
Personally I’d lean more towards the latter than the former especially for a production server as the former may introduce instability.