I am currently working on .net 4.5 application that contains multi lingual data.
I am new to this so I am looking for resources that explain concepts such as encoding for different languages, globalization, localization etc.
Any tips as to where I should look for such information?
MSDN – as always – is the best resource: http://msdn.microsoft.com/en-us/library/h6270d0z.aspx .
Some gotchas from my own experience:
Use unicode types in your database. So for SQL Server, make your text types nvarchar, ntext instead of varchar, text to have them as unicode. Otherwise you will lose information in languages such as Chinese
Make your design flexible, a phrase that is 10 characters in English could easily be 3-4 times as big in German or French, make your buttons flexible (sliding door technique for example for html), make your width and heights percentages and as responsive as possible.
In your resource files, have plural and singular forms of strings with placeholders for numbers, for example, if you have a phrase stating “within 2 km of this place” then you will probably need a resource entry for Km separately from the whole sentence for scenarios of singular/plural (kilometers, kilometer) don’t assume that you could just add an “s” for pluralization. That won’t work in all languages. Some languages even have a special case for singular, plural and for two objects that are not treated the same as plural (i.e. arabic) (Look at Dwayne’s comment for an interesting intake on this point)
If you’re going to localize for a language such as Arabic or Hebrew, then these are right to left, your whole design (including pictures) will need to change orientation. In HTML, that’s as easy – mostly – as having a “dir: rtl” attribute, but sometimes it can be tricky.
It’s not just about translation. Things that will change include number formats, using comma seperators or periods for decimal points and thousands, currency symbols coming before or after, currency formatting, date formatting etc… Make sure that all of these are formatted by .net framework using the culture of the current user.
Be disciplined about not hardcoding any strings in your UI. A handy trick is to have a resource language for a language that doesn’t use latin characters (Chinese, Russian, Arabic whatever), create a resource file for that language and fill all entries with random string from Google in that language. Run your application, and you will be able to easily spot the parts of the UI that are not coming from the resource file (they will be the english characters in the middle of the Chinese ones).
It is not just about the UI. If you are sending messages from the backend, like a response from a service or so on, that also needs to be localized. In some cases, even error messages logged in the Event log are required to be localized. Make sure you think about that.
Javascript. If you’re doing ajaxified web with heavy javascript, you might need to use a library such jquery localization to help with localization. You will have to serve your resource file in a JS key-value kind of structure. Since this is less standard than ASP.NET, it could require some improvisation from your side depending on your needs (decisions such as how to load these files with resources, all-at-once or with AMD, or may be create a service that returns the localized strings, or just let asp.net bind the values from the actual resource file at compile time etc…)