I am writing currently a Flac-Decoder and so I have to read 2 UTF8 values encoded in the flac-header.
This is in the documentation:
if(variable blocksize)
<8-56>:"UTF-8" coded sample number (decoded number is 36 bits)
else
<8-48>:"UTF-8" coded frame number (decoded number is 31 bits)
They use a selfmade function in their bitreader file Bitreader (line 1327) for the bigger UTF8 (variable blocksize).
I ve took a look on it and it is not that very nice code to translate into c#. So I thought about using the binaryreader with UTF8 Encoding and read with this method: ReadUint64.
Is it possible that this works? Is it the same result and what would be the absolutly fastest solution?
No, that will not work. ReadUInt64 will just read 8 bytes; the encoding is only used for reading actual text – i.e. ReadChar and ReadChars – and those will also not work, since the
chartype is only 16-bit, and neither of those would expect a 36-bit value anyway.When they write “UTF8 coded” in your documentation, that doesn’t mean it’s true UTF-8 – it just means they encode a number using the same principle as is used by UTF-8 to encode characters (which are, after all, also just numbers, but with more complex restrictions).
If you look at Wikipedia, you will see that they have listed exactly how UTF-8 characters are encoded, for up to 31 bits. It is very straightforward to continue this sequence for a 36-bit value – in that case, the first byte would be 11111110 in binary – and that’s what you’re supposed to do for the sample numbers.
While you may not think the code is nice, that’s pretty much the most sensible way to do it – you’re not going to avoid bit manipulation anyway, because of how UTF-8 works – and while it is certainly possible to make some variations on that exact code, the basic structure is unlikely to be very different.