I would like to have a collation which orders the UTF-8 encoding of 0x1234 below of 0x1235 regardless of the character mapping in the Unicode standard. MySQL uses utf8_bin for this. MSSQL apparently http://msdn.microsoft.com/en-us/library/ms143350.aspx have BIN and BIN2 collations. While finding these were easy, I can’t even find a list of collations PostgreSQL supports much less answer to this specific question.
Share
The C locale will do. UTF-8 is designed so that byte ordering is also codepoint ordering. This is not trivial but consider how UTF-8 works:
When sorting binary data aka C locale, the first non-equal byte will determine the ordering. What we neeed to see that if two numbers encoded into UTF-8 differ then the first non-equal byte will be lower for the lower value. If the numbers are in different ranges then the first byte will indeed be lower for the lower number. Within the same range, the order is determined by literally the same bits as without encoding.