I’m currently setting up SQL tables for language listing that will serve as a framework for translation efforts of my web application in the near future.
The tables and fields I’ve decided to create look like this:
Language
--------
Language ID
Language name
Native name
Region
------
Language ID
Region
Active
------
Language ID
Active
Language ID would the primary key in both tables.
The active field in the active table simply allows me to know which languages have been been translated (denoted by 1) and which haven’t.
Sample data would look something like this:
Language table
Language ID Language name Native name
----------- ------------- -------------
afr Afrikaans Afrikaans
sqi Albanian gjuha shqipe
ara Arabic العربية
Region table
Language ID Region
----------- -------------
afr Africa
sqi Eastern Europe
ara Middle East
Active table
Language ID Active
----------- -------------
afr 0
sqi 1
ara 0
There plans to be a fixed number of languages and regions. The languages I plan to add to the database (for later translation use) include (all their codes will be garnered from the ISO 639-2/B column which can be found here: http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes):
Afrikaans
Albanian
Arabic
Belarusian
Bulgarian
Catalan
Chinese (simplified)
Chinese (traditional)
Croatian
Czech
Danish
Dutch
English
Estonian
Esperanto
Filipino
Finnish
French
Galician
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Irish
Italian
Japanese
Korean
Latvian
Lithuanian
Macedonian
Malay
Maltese
Norwegian
Persian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Swahili
Swedish
Thai
Turkish
Ukrainian
Vietnamese
Welsh
Yiddish
The region categories include:
All
Africa
Middle East
Americas
Asia-Pacific
Eastern Europe
Western Europe
My question is now have I fully normalised this table? Can it be normalised any further or made any more efficient. I would appreciate any suggestions before I go about setting it up.
If you are using languageID as the PK in multiple tables you are not properly normalized.
Is the active table just supposed to indicate if the language is active or not? if so active should just be a field in the language table.
Region table should be regionID, region.
Another table should be added that links a region to a language which would only contain the columns languageID and regionID, the pk should be a combined key of regionID and languageID. This will allow you the ability to extend to better represent the many to many relationship that is language to region.
By Request (and some minor changes I would make):
In you example it’s not obvious why you would want a link table, but think of it when you get to “the americas” region, you actually need to have multiple languages as each country may speak English, Spanish, French or Portuguese. And even when you break it down to a country, Canada would need both French and English. Also Spanish is going to be in multiple regions, Europe and The Americas. The link table will better represent this.
Language ID should be an int, even if the 3 letter abbreviation is unique, Most SQL engines will be able to join on ints faster so your language table should be:
I can’t seem to put a 0 in the active field after Arabic, whatever ASCII that squiggle is doesn’t play nice, so just imagine that it’s there.
Your region table should be simply:
Now you want a link table, which would basically be:
So you can see this better represents that multiple languages may be used in the same region, and a language may be used in multiple locations.