I’m writing a class which is used to work against a byte[] buffer. It contains methods like char Peek() and string ReadRestOfLine().
The problem is that I would like to add support for unicode and I don’t really know how I should change those methods (they only support ASCII now).
How do I detect that the next bytes in the buffer is a unicode sequence (utf8 or utf16)? And how do I convert them to a char?
Update
Yes, the class is a bit similar to the StreamReader, but with the difference that it will avoid creating objects (like string, char[]) etc until the entire wanted string has been found. It’s used in a high performance socket framework.
For instance: Let’s say that I want write a proxy that will only check the URI in a HTTP request. If I where to use the StreamReader I would have to build a temp char array each time a new receive have been completed just to see if a new line character have been received.
By using a class that works directly against the byte[] buffer that socket.ReceiveAsync uses, I just have to traverse the buffer in my parser to know if the next step can be completed. No temporary objects are created.
For most protocols ASCII is used in the header area and UTF8 will not be a problem (the request body can be parsed using StreamReader). I’m just interested in how it can be solved avoiding to create unnecessary objects.
I’ve created a
BufferSliceclass which wraps the byte[] buffer and makes sure that only the assigned slice is used. I’ve also created a custom reader to parse the buffer.UTF turned out to not be a problem since I only parse the buffer to find characters that is not multi-bytes (space, minus, semicolon etc). I then use
Encoding.GetStringfrom the last delimiter to the current to get a proper string back.