Some columns of my tables need to support Unicode characters (Let’s say 1% of all my columns).
I think I have the following two options:
- Implement the Unicode columns as NVARCHAR2; or
- Change the character set of the whole database to one that supports Unicode (this way I can use VARCHAR2).
I’m inclined to the second option (in order to not have to change my already existing VARCHAR2 scripts).
My question is: what are the disadvantages and advantages of this second option, when compared with the first one? Is it less performant?
I would strongly lean toward changing the character set of the database.
There are potential downsides of doing so
VARCHAR2. By default,VARCHAR2(50)allocates 50 bytes of storage which will allow you to store between 16 and 50 characters if you’re using an AL32UTF8 character set rather than being a simple 1:1 mapping like it is if you’re using a single-byte character set. That will require either that you increase the size of your columns (i.e. triple them) to ensure that they store the appropriate number of characters or that you specify character-length semantics when you declare the column (i.e.VARCHAR2(50 CHAR)) or that you set yourNLS_LENGTH_SEMANTICStoCHARbefore creating the objects in order to change the default to character-length semantics. There is a discussion on the Oracle Globalization forum about whether it is appropriate to change the NLS_LENGTH_SEMANTICS at the instance level— Sergiusz Wolicki, one of Oracle’s top globalization gurus suggests strongly against it though I am personally much more willing to consider it under the right circumstances. You can also set theNLS_LENGTH_SEMANTICSat the session level which is something Sergiusz does not object to but does require that you do so every time you run a script which may be a concern.CHAR_LENGTHandDATA_LENGTHcolumns where they want the length in characters vs. the length in bytes. This may be a minor issue for you or a serious pain if you have existing tools/ scripts/ etc. that run queries against the data dictionary to generate DDL or to determine how much memory needs to be allocated or some other situation where you end up getting funky results.However, those downsides are more than outweighed by the advantages of having a single character set for all your data
NVARCHAR2columns generally requires application code changes. Since you’re going to have bothVARCHAR2andNVARCHAR2columns, those code changes and configuration settings can be non-trivial and are often a major annoyance. Inevitably, you’ll find that you mapped a particular column incorrectly in some application and you’ll encounter data corruption bugs that are a pain to track down. This is more true the more layers of abstraction you have between your database and your application.VARCHAR2toNVARCHAR2is a pain– you’ll need to add a new column, copy the data over, and remove the old column, rename the new column, and deal with the row migration that resulted. You’ll then have to make changes to all your existing applications so that they map the column correctly. When the business decides that one more column needs to support additional languages and your database and applications already support Unicode, that level of effort and testing will seem rather excessive.NVARCHAR2column as a literal in a SQL statement either in an application (say, to avoid bind variable peeking or to better leverage a histogram) or as part of production support when you want to track down issues in the data.NVARCHAR2columns is strongly discouraged. This probably doesn’t have immediate practical consequences but if your system is supposed to be around for a number of years, it’s likely that there will be consequences in the future.Sergiusz sums up Oracle’s advice very well in this thread