How can I return the Unicode Code Point of a character? For example, if the input is “A”, then the output should be “U+0041”. Ideally, a solution should take care of surrogate pairs.
With code point I mean the actual code point according to Unicode, which is different from code unit (UTF8 has 8-bit code units, UTF16 has 16-bit code units and UTF32 has 32-bit code units, in the latter case the value is equal to the code point, after taking endianness into account).
Easy, since chars in C# is actually UTF16 code points:
To address the comments, A
charin C# is a 16 bit number, and holds a UTF16 code point. Code points above 16 the bit space cannot be represented in a C# character. Characters in C# is not variable width. A string however can have 2 chars following each other, each being a code unit, forming a UTF16 code point. If you have a string input and characters above the 16 bit space, you can usechar.IsSurrogatePairandChar.ConvertToUtf32, as suggested in another answer: