I am implementing run length encoding using the GZipStream class in a C# winforms app.
Data is provided as a series of strings separated by newline characters, like this:
FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF
Before compressing, I convert the string to a byte array, but doing so fails if newline characters are present.
Each newline is significant, but I am not sure how to preserve their position in the encoding.
Here is the code I am using to convert to a byte array:
private static byte[] HexStringToByteArray(string _hex)
{
_hex = _hex.Replace("\r\n", "");
if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
int l = _hex.Length / 2;
byte[] b = new byte[l];
for (int i = 0; i < l; i++)
b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
return b;
}
Convert.ToByte throws a FormatException if the newlines are not removed, with the info: “Additional non-parsable characters are at the end of the string.” Which doesn’t surprise me.
What would be the best way to make sure newline characters can be included properly?
Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.
Edit:
I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:
private static byte[] StringToByteArray(string _s)
{
Encoding enc = Encoding.ASCII;
return enc.GetBytes(_s);
}
public static byte[] Compress(byte[] buffer)
{
MemoryStream ms = new MemoryStream();
GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
zip.Write(buffer, 0, buffer.Length);
zip.Close();
ms.Position = 0;
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return gzBuffer;
}
Firstly: are you certain that just compressing the text doesn’t give much the same result as compressing the “converted to binary” form?
Assuming you want to go ahead with converting to binary, I can suggest two options:
EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of
GZipStreamis not text, and shouldn’t be treated as if it were text usingEncoding. However, you can turn it into ASCII text very easily, by callingConvert.ToBase64String. By the way, another trick you’ve missed is to callToArrayon theMemoryStream, which will give you the contents as abyte[]with no extra messing around.