I am in the process of migrating a large scale web service to be compatible with international characters. It is a Tomcat/Spring MVC/SQL Server stack. The migration itself was relatively straight forward, we made a few settings changes in Tomcat to force default use of UTF-8 in the response, changed some Java code to use the encoding and migated a few VARCHAR columns to NVARCHAR followed by a healthy dose of unit/functional tests.
Another person on my team wants a load test now to make sure none of the changes adversely affect the system performance. The individual components of this transition described above don’t really hint at any performance changes and, frankly, I don’t think it’s completely necessary based on my limited knowledge. I plan to do it anyway, but my question, then, is this – are there any performance gotchas one might see in such a migration? Is there anything specific to a different character encoding that might change the performance of a system?
The only thing I could think of would be heavy string comparison and sorting, etc. Any ideas?
I only have this anecdote:
In my former company we ran into the issue that a text field in the database (ASCII) was being matched against unicode string in a query. This caused sql server to switch to table scans rather than the usual index, because it could not prove the string would always be translatable to ascii. This was a significant performance hit for us.