I am wringing a class method that will convert a UTF8 character into its representative Unicode code point. My prototype candidates are the ones below:
static uint32_t Utf8ToWStr( uint8_t Byte1, uint8_t Byte2 = 0x00,
uint8_t Byte3 = 0x00, uint8_t Byte4 = 0x00,
uint8_t Byte5 = 0x00, uint8_t Byte6 = 0x00);
static uint32_t Utf8ToWStr(const std::vector<uint8_t> & Bytes);
In my applications;
Byte1 will be the only non-zero byte approximately 90% of the time.
Byte1 and Byte2 will be the only non-zero bytes approximately 9% of the time.
Byte1, Byte2 and Byte3 will be the only non-zero byte less than 1% of the time.
Byte4, Byte5 and Byte6 will almost always be zero.
Which prototype should I prefer for speed?
Probably neither.
Think of the code calling this function — they will likely have to jump through massive hoops to use it:
I sincerely doubt this interface is useful.
My normal interface for conversion routines is
The return value is used to convey errors (for example, how would your function react to the parameters
(0x81, 0x00)?Last but not least, you might want to have a mode that specifies whether denormalized UTF-8 should give an error — from a security POV it is a good idea to disallow encoding
U+003Fas0x80 0x3f.