I want to make a multi-language site, such that all or almost all pages will be available in 2 or more translations. What are the best practices to follow?
For example, I consider these language selection mechanisms:
- Cookie-based selection of the preferred language.
- Based on
Accept-Languageheader if the cookie is not set. - Based on GeoIP otherwise (probably).
Is there anything else?
How should different translations be served?
- as
LANG.example.com/page - as
example.com/LANG/page - as
example.com/page?hl=LANG - …
- any of the above with a redirect to
example.com/page? (It seems to be discouraged)
How to ensure that all the translations are properly indexed?
- Sitemaps with all pages + correct
Content-Languageheader are enough?
What is the best way to let the users know there are other translations, but do not distract them?
- list available languages in the header/footer/sidebar (like Wikipedia)
- put “Choose a language” selector next to the content
What is the best policy to deal with missing/outdated translations?
- do not display missing pages at all or display a page in a different language?
- display old translation, old translation with a warning or a page in a different language?
What else should I take into account? What should I do and what I definitely should not?
In addition to @Quassnoi’s answers ensure that you standard RFC 4646 language identifiers (e.g. EN-US, DE-AT); you may already be aware of this. The CLDR project is an excellent repository of internationalization data (the Supplemental Data is really useful).
If a translation of a specific page is not available, use a language fallback mechanism back to the neutral language; for example ‘DE-AT’, ‘DE’, ” (neutral, e.g. ‘EN’).
Most recent browsers and the underlying operating systems will correctly show all of the characters required for a locale selector list if the page is encoded correctly (I’d recommend all pages being UTF-8). Ensure that the locale list contains both the native and current-language names to allow both native and non-native speakers to view the specified translations, e.g. ‘Deutsch (German)’ if the current locale is EN-
*.A lot of sites use a flag icon to show the current locale, but this is more relevant to the location and some people may be offended if you show only a dominant flag (e.g. the US or UK flag for English).
It may be worthwhile to have a more visible (semi-graphical) locale selector on the home page if no locale cookie has been submitted, using a combination of GeoIP and Accept-Language to determine the default locale choice.
Semi-related: if your users are in located in different time zones include a zone preference in their account profile for displaying time values in their local time. And store all time stamps using UTC.