I’m currently building a multilingual website using PHP and MySQL and am wondering what the best approach regarding localization is with regards to my data model. Each table contains fields that need to be translated (e.g. name, description….) in several languages.
The first idea was to create a field for each language (e.g. name_en, name_de, name_fr) and retrieve the appropriate field from PHP using a variable (e.g. $entry['name_' . LANGUAGE]). While it would work, this approach has in my opinion many drawbacks:
-you need as many occurrences of each field as you have languages (bearing in mind you can have en-US, en-CA, en-GB…)
-if you add or remove languages you need to modify the database structure accordingly
-if you have untranslated fields, they are still created for each entry which doesn’t seem very optimized
The second idea is to create a translation table that can be used to store the translation of any field of any table in the database:
---------------- translation ---------------- id INT table_name VARCHAR field_name VARCHAR value VARCHAR language_id VARCHAR
The table_name and field_name will allow identifying which table and which field the translation is about, while language_id will indicate which language that translation if for. The idea is to create models that would replace the value of the translatable fields (e.g. name, description) by their corresponding translation based on the language selected by the user.
Can you see drawbacks with this approach? Have you got suggestions to make?
Thanks.
The main drawback is that you destroy the relational model by storing metadata like table name and field name as application data. You queries would be too ugly and non-effective.
Another drawback is that you are limited only to one data type of the translatable data. Your table structure would define
which means you would store data that would require smaller field always in VARCHAR(255). And if you like to have it even more universal to store also large text you need to define it
which is even worse.
The popular model is the following. For every entity you define the fields which are not language dependent and those which are language dependent and create always 2 tables. For example:
This is the proper relational approach. Of course, it also has drawbacks major one being that you would always join 2 table to fetch items and adding/updating of data becomes a bit more complex.