I have a Stream that produces UTF-8 encoded strings. The strings represent XML documents

Question

0

Asked: May 24, 20262026-05-24T22:28:27+00:00 2026-05-24T22:28:27+00:00

I have a Stream that produces UTF-8 encoded strings. The strings represent XML documents

0

I have a Stream that produces UTF-8 encoded strings. The strings represent XML documents that I need to parse. The stream is obtained from a TcpClient.

Suppose I read the stream into buffers of size 64 (a little small, I know). Passing these 64 byte buffers directly to the string decoding step could fail because some UTF-8 encoded characters may be split along the 64 byte boundary. The buffer may end with the first two bytes of a character and the next buffer has the last byte for this character.

What I do now, is concatenate buffers until I perform a read that doesn’t read the full 64 bytes, indicating that I have read to the end of something (in my case, an XML document). However, once in a while, an XML documents I read ends exactly at the 64 byte boundary. In such a case, I do not know I can pass the byte array to the decoding step (and I need to wait for the next document).

I realize I can lower the chances by increasing the buffer size. However, a small chance always remains that it happens. I could also increase the buffer size such that any XML document I encounter will fit, but I just wonder whether there is another solution, somehow detecting from the byte stream where the character boundaries are.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T22:28:28+00:00

Editorial Team

2026-05-24T22:28:28+00:00Added an answer on May 24, 2026 at 10:28 pm

You are right about the problems and pitfalls.

The solution already exists: wrap a StreamReader around your stream and use Read() and ReadLine()

If you do want a DIY solution you’ll have to look at the Encoder state properties. Beyond my capabilities.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a Stream that produces UTF-8 encoded strings. The strings represent XML documents

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply