I have a byte array as input. This should be the binary representation of standard conforming UTF8 HTML. It is, but most of the time only. Sometimes it also contains embedded nulls (\x0 character or NUL). This is not under my control. I need to transform this byte array into a string.
Tried so far:
- Obviously using a
StreamReaderorTextReaderdoes not work as it stops on hitting the firstNUL Encoding.UTF8.GetStringdoes not work either – also stops on the firstNUL
What worked but is rather unelegant:
mynewarray = myoldarray.Where( x => x!=0).ToArray();
var output = Encoding.UTF8.GetString(mynewarray);
Is there a more elegant way to do this excepted in creating a new byte array skipping the NUL chars and then use one of the solutions above? The byte array can be pretty big, more than 2-4 Mb… The MSDN tells that Strings might actually contain embedded NUL but does not tell what are the best approaches to handle such strings.
Your string is already right. It will contain the
NULcharacters. But when you use a string with the includedNULchars you will get all kind of problems.Encoding.UTF8.GetStringdoes not stop at \0 as you see in my example.See what happens when I output such a string:
output is: