I have a simple function to create a gzip file. This function work fine and pass the unit test. Then I hosted the generated filed at amazon s3.
But it produce some invalid character when the input value contain a unicode character.
eg.アームバンド & ケース > 9ÎvøS‰
public static void CompressStringToFile(string fileName, string value)
{
// Use GZipStream to write compressed bytes to target file.
using (FileStream f2 = new FileStream(fileName, FileMode.Create))
using (GZipStream gz = new GZipStream(f2,CompressionMode.Compress, false))
{
byte[] b = Encoding.Unicode.GetBytes(value);
gz.Write(b, 0, b.Length);
gz.Flush();
}
}
The output of GZip compression isn’t meant to be text. It’s effectively arbitrary binary content, which you should only use to decompress it to the original binary content… which in your case is UTF-16-encoded text. You shouldn’t expect to be able to read the gzip file as a text file.
GZip itself doesn’t interpret the (binary) data that it’s given – it just compresses it, so it can be faithfully decompressed later on. GZip couldn’t care less whether it’s text, an image, a sound file, whatever: it just does the best it can to compress it.