I received the following query from a customer:
I am doing some research into
character sets for future versions of
our products.Most of the sites we have built use
html including a meta tag for
iso-8859-1 – the Western European
Latin 1 alphabet rather than UTF-8
unicode.I have setup a page to play with this,
and find that I can able to paste in
various scripts to the rich text
editor : chinese, punjabi, arabic,
rumanian etc, with no problems and
they display on the webpage ok (in
Firefox/IE8).I was a little surprised that my page
was rendering these scripts correctly
as they are not included in the Latin
alphabet.Reading further I see that ‘It is a
common misunderstanding that (the
iso-8859-1 metatag) that is needed, it
is not’As ‘when your browser makes the
request to the server it tells the
server what it wants and can handle.
By the time the browser reads that
code, the mimetype has already set the
character set.’So it seems the available character
set is determined by the web server
rather than the application/html.Can you confirm if this is correct –
does IIS 6 /7 support such character
sets as you have it configured, and do
you know of any problems with
languages widely spoken in the UK
being represented on our
servers? (asian, east/west european,
arabic etc).
The customer’s server is Windows 2003 with the Region and Language Options configured as:
Regional Options Tab –
Standards and Formats: United Kingdom
Location: United Kingdom
Languages Tab –
Text Services and Input Languages – English (United Kingdom)
Advanced Tab –
Language for non-unicode programs: English (United Kingdom)
Code page conversion tables: All checked (there’s quite a few listed: Japanese, Korean, Arabic etc)
Do I need to do anything to the configuration of the server, or does the customer configure this through settings in their web.config file and ensure that any database fields that might store non-latin characters are configured as unicode?
ASP.NET serves responses in UTF-8 activated by default.
The encoding in specified in response headers so you shouldn’t do anything special. However you may wish to add this tag to page header:
You can configure this behavior in your web.config:
Read here: How to: Select an Encoding for ASP.NET Web Page Globalization
Regarding database fields, if we’re talking about SQL Server, the fields need to be nvarchar and nchar, not varchar/char.