I am going to be building a application which will be used by people all over Europe. I need to know which collation and character set would be best suited for user inputted data. Or should I make a separate table for each language. A article to something explaining this would be great.
Thanks 🙂
Unicode is a very large character set including nearly all characters from nearly all languages.
There are a number of ways to store Unicode text as a sequence of bytes – these ways are called encodings. All Unicode encodings (well, all complete Unicode encodings) can store all Unicode text as a sequence of bytes, in some format – but the number of bytes that any given piece of text takes will depend on the encoding used.
UTF-8 is a Unicode encoding that is optimized for English and other languages which use very few characters outside the Latin alphabet. UTF-16 is a Unicode encoding which is possibly more appropriate for text in a variety of European languages. Java and .NET store all text in-memory (the
Stringclass) as UTF-16 encoded Unicode.