If the database uses UTF-8 encoding, can text from all human languages be properly stored and retrieved?
Are there any “gotchas” when dealing with non-English languages in a PostgreSQL database?
Working in Ruby on Rails and PostgreSQL 9.1.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
In addition to Spidey and Kevin’s points (use utf-8 in the client and an
ENCODING 'utf-8'database, beware of differing collations), I strongly recommend tagging each text field with the language it is in if at all possible.If you ever want to use full text search or any kind of linguistic analysis, it really helps to know which language each field is in. Full text search can’t do root-word analysis etc unless it has a dictionary and suffix list for the text being indexed – and for that it needs to know the language.
Storing ISO 639 language codes is probably a reasonable choice.