I’m trying to gain a basic understanding of what is meant by a Windows code page. I kind of get the feeling it’s a translation between a given 8 bit value and some ‘abstraction’ for a given character graphic.
I made the following experiment. I created a “” character literal with two versions of the letter u with an umlaut. One created using the ALT 129 (uses code page 437) value and one using the ALT 0252 (uses code page 1252) value. When I examined the literal both characters had the value 252.
Is 252 the universal 8 bit abstraction for u with an umlaut? Is it the Unicode value?
Aside from keyboard input are there any library routines or system calls that use code pages?
For example is there a function to translate a string using a given code table (as above for the ALT 129 value)?
Windows code-pages are a relic of pre-unicode days, when languages with different characters would still attempt to represent them using one (or two in the case of Asian) bytes. This is where the concept of a character set comes into play. English, for instance, is “windows-1252”. The various code pages can be installed through the Regional & Language Options control panel. A list of code-pages can be found here – http://msdn.microsoft.com/en-us/goglobal/bb964654.aspx
Within .NET, code-pages are accessed through the System.Text.Encoding class. This provides a method for converting from one code page to another. For instance, to convert a string in windows-1252 to utf8 (admittedly usually a fairly pointless exercise), you could use this code: