The website I’m currently working on collects data from various sources (human entered). The data is being stored in Nvarchar fields in the database. Currently the site specifies that the charset is UCS-2 through a meta tag. Until now the site has required answers in English. Soon though we will be allowing/requiring at least some of the fields to be entered in their native language (i.e. Chinese in this case). Based on some research and other posts on the site it seems that UCS-2 and UTF-16 are pretty much the same thing with some minor technical differences. If it matters this is an asp.net website running on a SQL Server database. So my questions are:
Is there a reason for me to change the meta tag to specify UTF-16?
Will I have any issues with the way characters are displayed if I change the encoding? (I think the current data should display the same since it’s most/all English but I’d like to confirm that)
UCS-2 is a strict subset of UTF-16 — it can encode characters in the Basic Multilingual Plane (i.e., from U+0000 til U+FFFF) only. If you need to express characters in the supplementary planes (which includes some relatively rare Chinese characters), they must be encoded using pairs of two 16 bit code units (“surrogates”), and if so your data will not be valid UCS-2 but must be declared as UTF-16.
IF you can easily switch the encoding specification to UTF-16, there should be little reason not to do so immediately, unless your data is being consumed by ancient software that doesn’t know what “UTF-16” means.